SlideShare a Scribd company logo
1 of 21
Download to read offline
IEEE, 12th Annual Conference on Privacy Security
Trust, PST 2014
MindYourPrivacy: Design and
Implementation of a Visualization
System for Third-Party Web
Tracking
Yuuki Takano, Satoshi Ohta,
Takeshi Takahashi, Ruo Ando,
Tomoya Inoue
1
Introduction
❖ The number of third-party Web tracking is growing each year.!
❖ online privacy is now significant issue!
❖ SNSs and targeted ads can associate real names of individuals with tracking
information!
❖ Propose MindYourPrivacy to visualize and show third-party web tracking.!
❖ deep-packet-inspection based architecture!
❖ to support heterogeneous browsers and devices!
❖ Experimented MindYourPrivacy at the Workshop (WIDE Camp 2014 Autumn in
JAPAN), which has 129 attendees.!
❖ reveal that clustering web graph helps to detect ads’ sites by analyzing user traffic!
❖ some graph theory features also help to heuristically detect ads sites
2
Related Work
Web Tracking Mechanism
❖ Third-party Web tracker typically tracks by cookie,
Etags or flash storage
web bug (1x1 pict)
ads
social widgets
First-party Web servers
Third-party Web tracker
tracking id (cookie, Etags, flash storage, etc...)
contents
contents
3
platform.twitter.com
guest_id=v1%3A135875454567229819!
twll=l%3D1363156464
4
platform.twitter.com
guest_id=v1%3A135875454567229819!
twll=l%3D1363156464
YES. Twitter knows our tendency.
5
Related Work
Web Tracking Detection Techniques
❖ ShareMeNot!
❖ swap a link to known data-collection sites such as Facebook!
❖ Roesner et al. “Detecting and defending against third-party tracking on the
web”, USENIX NSDI 2012!
❖ Lightbeam!
❖ visualize web graph between first and third-party sites!
❖ https://www.mozilla.org/lightbeam/!
❖ AdBlock Plus!
❖ signature based ads detection and blocking!
❖ https://adblockplus.org/en/firefox
6
Related Work
Measurements
❖ Several researchers reported on third party web tracker.!
❖ One of the research reported third-party trackers within Alexa’s top 500 domains.!
❖ Roesner et al, “Detecting and defending against third-party tracking on the web”, USENIX NSDI 2012!
e fact that the tracking
t it is thus difficult to
or policy solutions.
s classification is ag-
on of the mechanisms
e storage may be done
, and information may
ker in any way. This
anism makes the clas-
evolution of specific
by trackers.
ework, we created a
tomatically classifies
rved on the client-side.
Figure 6: Prevalence of Trackers on Top 500 Domains.
Trackers are counted on domains, i.e., if a particular tracker
appears on two pages of a domain, it is counted once.
Top 20 Trackers on Alexa’s Top 500 Domains!
[Roesner et al. NSDI 2012]
7
MindYourPrivacy
Design Principle
❖ We designed and implemented a visualization system for third-party web tracking called
MindYourPrivacy.!
❖ To clearly show third-party web trackers to users.!
❖ Design Principles of MindYourPrivacy!
❖ Independence from browsers and devices!
❖ the existence of various OSes or devices such as Linux, Windows, MacOS, and smartphone
OSes such as Android and iOS complicates the problem!
❖ adopt a deep-packet-inspection based approach to support heterogeneous browsers and devices!
❖ Accessibility and comprehensiveness of the analysis results!
❖ easy to access: MindYourPrivacy provides analysis results in the form of an HTML file via an
HTTP server to facilitate users’ access to them!
❖ easy to understand: visualize trackers by tag cloud fashion, and provide web graph’s file further
analysis
8
Design and Implementation
Web Tracker Identification Methodology (1)
❖ HTTP Referrer Web Graph Analysis!
❖ generate a web graph by using HTTP referrer tag!
❖ if an site is referred by many other sites, MindYourPrivacy
assumes that it is a suspicious site tracking users!
❖ Domain Aggregation!
❖ to show users which organizations track them, MindYourPrivacy
aggregates domains as either second or third level!
❖ platform.twitter.com and platform0.twitter.com are aggregated to
twitter.com
9
Design and Implementation
Web Tracker Identification Methodology (2)
❖ DNS-SOA-Record-Based Grouping!
❖ aggregate domains by DNS SOA record!
❖ facebook.com and facebook.net are aggregated into dns.facebook.com,
which is their DNS SOA record!
❖ Balanchander et al., “Privacy diffusion on the web: a longitudinal
perspective”, WWW 2009!
❖ Weighted site Ranking of User Data Leakage!
❖ MindYourPrivacy shows not only web trackers but also leaking sites to
trackers!
❖ leaking sites are scored, but the details are omitted here. see our paper
10
Design and Implementation
System Model
❖ MindYourPrivacy captures traffic of users’ web access!
❖ show analyzed results via MindYourPrivacy’s web server!
❖ users need not install or configure specific applications
MindYourPrivacy
The Internet
Traffic Capture
Web Access
Analyzed Result via HTTP
Outgoing Traffic
Router・・・
Users
11
Design and Implementation
Implementation Architecture
❖ Catenaccio DPI!
❖ capture traffic from network IF!
❖ reconstruct TCP stream and store captured data into
NoSQL DB!
❖ written in C++!
❖ NoSQL DB!
❖ use MongoDB as a database!
❖ Tracking Analyzer!
❖ analyze measurement data!
❖ written in JavaScript and Python!
❖ HTML/Graph File Generator!
❖ generate visualized results!
❖ written in Python!
❖ HTML Server!
❖ serve HTML/Graph files to users
Catenaccio DPI NoSQL DB
Tracking Analyzer
HTML/Graph File
Generator
HTML Server
NW/IF
L2 Datagram
Measurement Data
Analyzed Result
Measurement Data
HTML/Graph Files
Analyzing Result
12
Design and Implementation
Web User Interface
❖ visualize suspicious web trackers as tag cloud fashion!
❖ domains are grouped by DNS SOA records!
❖ referring sites are shown in right pane
Experiment at WIDE Camp 2013 Autumn
❖ We experimented MindYourPrivacy at WIDE camp 2013 autumn.!
❖ WIDE Camp 2013 Autumn (Sep. 10 - Sep. 13)!
❖ a workshop for Internet researchers, operators and developers!
❖ 129 attendees, most of whom are either IT specialists or
students majoring IT!
❖ the experiment is agreed by every attendees (for only research
purpose)!
❖ We captured the attendees’ web browsing traffic and analyzed.
14
Experiment
User Traffic Analysis (1)
❖ Obtained 734,194 HTTP
requests and 1,661
individual source IP
addresses (IPv4 and IPv6).!
❖ A directed web graph is
generated by using HTTP
referrer header.!
❖ There are 3,966 nodes and
12,941 edges.!
❖ Analyze this web graph to
find web trackers.
15
Experiment
User Traffic Analysis (2)
❖ To find web trackers, we extract top most-referred sites
from the web graph!
❖ Advertisements and social sites, which tend to track
users, have many incoming links.
ttendees
Total
117
12
129
RLs are only
TABLE II: Top-five Most-referred Sites
Site # of incoming links
google-analytics.com 847
facebook.com 437
twitter.com 393
doubleclick.net 380
google.com 356
16
Top-Five Most-referred Sites
Experiment
User Traffic Analysis (3)
❖ We then adopted a clustering technique (M-CODE) to the web graph.!
❖ As a result of clustering, many ad-sites are found in cluster.
referred Graph Pane: This pane provides referred
.dot and .sif formats. Users can download these
re and analyze or visualize the referred graph by
viz, Cytoscape, etc. Figures 5 and Figure 6 show
examples using Cytoscape. Through this sort of
users can easily find to which sites many other
IV. Experiment
strate the usability and effectiveness of the pro-
m, we conducted an experiment at WIDE camp
September 10–13 2013.
E project [19] is a research and development
apan aimed at developing a widely integrated
nvironment. It organizes camps every spring and
many researchers, developers, and students tak-
discussing Internet technologies. Table I lists the
f the camp attendees. There were 129 attendees,
m are either IT specialists or students majoring in
conducted two types of experiments: user traffic
questionnaire-based use analysis.
whose values are random text strings, the number of coo
values we observed, and examples. In total we obser
2,309 and 2,671 requests for platform.twitter.com
www.facebook.com, respectively. However, we found o
about 100 unique values for each cookie, though fr
www.facebook.com is 397. fr thus does not seem to
tracking cookies, and the 100 likely indicates the numbe
attendees (which was also around 100) or devices. The res
reveal that tracking cookies can also be used for per-u
analysis and visualization.
We then applied MCODE clustering [20] to the graph
Figure 5 to find further features. This allowed us to obse
many ad sites clustered into the rank 1 cluster by MCO
The following domains were ad sites found in the ran
cluster of Figure 6:
doubleclick.net, amazon-adsystem.com,
googleadservices.com, i-mobile.co.jp,
advg.jp, adingo.jp, iogous.com, admeld.com,
criteo.com.
Ad sites generally tend to collect user information for busin
purposes. We therefore should be concerned with the priv
issues they present. This discovery should help further anal
and visualization concerning such sites. Table IV lists
feature vector of ads and other sites that appeared in Figur
ad-sites in cluster
17
Experiment
User Traffic Analysis (4)
❖ We analyzed the cluster from the aspect of graph theory’s feature.!
❖ As a result of that, we found that ad-sites’ #incoming links, #outgoing links
and neighborhood connectivity are quite different from others.!
❖ ad-sites have many incoming links, but few outgoing links!
❖ ad-sites’ neighborhood connectivity is relatively low
18
Fig. 6: Rank 1 Cluster by MCODE (include loops = false,
degree cutoff = 2, haircut = true, fluff = false, node score
cutoff = 0.2, k-core = 2, and max. depth = 100)
TABLE IV: Feature Vector of Rank 1 Cluster’s Edge (Average
and Unbiased Variance)
#incoming links # of outgoing
links
Neighborhood
connectivity
avg. var. avg. var. avg. var.
ad sites 90.2 12405.4 15.2 3972.9 46.0 3972.9
others 30.2 3972.9 29.7 569.3 130.2 5212.0
measures, and the most popular measure is to use multiple
browsers. Although multiple browser usage does not strictly
the DNT flag i
tracking; it is ju
referrers or coo
online usability
not use SNSs.
of infrastructur
pros and cons o
The free-form
• Use privat
• Delete HT
• Use AdBlo
• Absolutely
Modern Web b
mode to isolat
responded that
Some of them
for not disablin
Some attendee
blocks online a
leakage throug
attendees answ
tracking. Such
privacy are qui
Question 3: D
after seeing the
Experiment
User Traffic Analysis (5)
❖ Do Not Track flag is used to announce a wish of users to
third-party trackers.!
❖ However only 40,650 (40,605/734,194 = 6 %) DNT
enabled requests are observed.
19
Conclusion and Future Work
❖ Proposed a visualization system for third-party web tracking called
MindYourPrivacy.!
❖ browser and device independent architecture!
❖ visualize web trackers as tag cloud fashion!
❖ Experimented MindYourPrivacy at WIDE camp 2013 autumn and analyze users’
web browsing traffic.!
❖ generate web graph by HTTP referrer and analyze it!
❖ revealed that graph clustering and some graph theory’s features are useful to
find web trackers!
❖ Adopting more sophisticated approaches we revealed at the experiment, and
signature based approach is a future work.
20
EOF
21

More Related Content

Similar to Visualize & Detect Third-Party Web Tracking

IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET Journal
 
Yelpcamp: A review based website for campgrounds
Yelpcamp: A review based website for campgroundsYelpcamp: A review based website for campgrounds
Yelpcamp: A review based website for campgroundsIRJET Journal
 
Making Web Analytics actionable with Web Content Management
Making Web Analytics actionable with Web Content ManagementMaking Web Analytics actionable with Web Content Management
Making Web Analytics actionable with Web Content ManagementAmplexor
 
Detecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousDetecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousNeo4j
 
Open / Public APIs - From Implementation to Digital Business Model
Open / Public APIs - From Implementation to Digital Business ModelOpen / Public APIs - From Implementation to Digital Business Model
Open / Public APIs - From Implementation to Digital Business ModelBastian Migge
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...IJCNCJournal
 
Smart Crawler Automation with RMI
Smart Crawler Automation with RMISmart Crawler Automation with RMI
Smart Crawler Automation with RMIIRJET Journal
 
Advanced internet technologies
Advanced internet technologiesAdvanced internet technologies
Advanced internet technologieschirag patil
 
Operating System Upgrade Implementation Report And...
Operating System Upgrade Implementation Report And...Operating System Upgrade Implementation Report And...
Operating System Upgrade Implementation Report And...Julie Kwhl
 
Detection of Phishing Websites
Detection of Phishing WebsitesDetection of Phishing Websites
Detection of Phishing WebsitesIRJET Journal
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...kevig
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...ijnlc
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine ScrapperIRJET Journal
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
 

Similar to Visualize & Detect Third-Party Web Tracking (20)

What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
CIS1203 Web Design Principles - Part 1
CIS1203 Web Design Principles - Part 1CIS1203 Web Design Principles - Part 1
CIS1203 Web Design Principles - Part 1
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
 
Yelpcamp: A review based website for campgrounds
Yelpcamp: A review based website for campgroundsYelpcamp: A review based website for campgrounds
Yelpcamp: A review based website for campgrounds
 
We are Digital Puppets
We are Digital PuppetsWe are Digital Puppets
We are Digital Puppets
 
Making Web Analytics actionable with Web Content Management
Making Web Analytics actionable with Web Content ManagementMaking Web Analytics actionable with Web Content Management
Making Web Analytics actionable with Web Content Management
 
Web Engineering
Web EngineeringWeb Engineering
Web Engineering
 
Detecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousDetecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and Linkurious
 
Open / Public APIs - From Implementation to Digital Business Model
Open / Public APIs - From Implementation to Digital Business ModelOpen / Public APIs - From Implementation to Digital Business Model
Open / Public APIs - From Implementation to Digital Business Model
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
 
Smart Crawler Automation with RMI
Smart Crawler Automation with RMISmart Crawler Automation with RMI
Smart Crawler Automation with RMI
 
Advanced internet technologies
Advanced internet technologiesAdvanced internet technologies
Advanced internet technologies
 
Operating System Upgrade Implementation Report And...
Operating System Upgrade Implementation Report And...Operating System Upgrade Implementation Report And...
Operating System Upgrade Implementation Report And...
 
Detection of Phishing Websites
Detection of Phishing WebsitesDetection of Phishing Websites
Detection of Phishing Websites
 
Deep Web
Deep WebDeep Web
Deep Web
 
Trends in front end engineering_handouts
Trends in front end engineering_handoutsTrends in front end engineering_handouts
Trends in front end engineering_handouts
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 

More from Yuuki Takano

アクターモデル
アクターモデルアクターモデル
アクターモデルYuuki Takano
 
FARIS: Fast and Memory-efficient URL Filter by Domain Specific Machine
FARIS: Fast and Memory-efficient URL Filter by Domain Specific MachineFARIS: Fast and Memory-efficient URL Filter by Domain Specific Machine
FARIS: Fast and Memory-efficient URL Filter by Domain Specific MachineYuuki Takano
 
リアクティブプログラミング
リアクティブプログラミングリアクティブプログラミング
リアクティブプログラミングYuuki Takano
 
Transactional Memory
Transactional MemoryTransactional Memory
Transactional MemoryYuuki Takano
 
Tutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorTutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorYuuki Takano
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
【やってみた】リーマン多様体へのグラフ描画アルゴリズムの実装【実装してみた】
【やってみた】リーマン多様体へのグラフ描画アルゴリズムの実装【実装してみた】【やってみた】リーマン多様体へのグラフ描画アルゴリズムの実装【実装してみた】
【やってみた】リーマン多様体へのグラフ描画アルゴリズムの実装【実装してみた】Yuuki Takano
 
SF-TAP: L7レベルネットワークトラフィック解析器
SF-TAP: L7レベルネットワークトラフィック解析器SF-TAP: L7レベルネットワークトラフィック解析器
SF-TAP: L7レベルネットワークトラフィック解析器Yuuki Takano
 
SF-TAP: 柔軟で規模追従可能なトラフィック解析基盤の設計
SF-TAP: 柔軟で規模追従可能なトラフィック解析基盤の設計SF-TAP: 柔軟で規模追従可能なトラフィック解析基盤の設計
SF-TAP: 柔軟で規模追従可能なトラフィック解析基盤の設計Yuuki Takano
 
Measurement Study of Open Resolvers and DNS Server Version
Measurement Study of Open Resolvers and DNS Server VersionMeasurement Study of Open Resolvers and DNS Server Version
Measurement Study of Open Resolvers and DNS Server VersionYuuki Takano
 
Security workshop 20131220
Security workshop 20131220Security workshop 20131220
Security workshop 20131220Yuuki Takano
 
Security workshop 20131213
Security workshop 20131213Security workshop 20131213
Security workshop 20131213Yuuki Takano
 
Security workshop 20131127
Security workshop 20131127Security workshop 20131127
Security workshop 20131127Yuuki Takano
 
A Measurement Study of Open Resolvers and DNS Server Version
A Measurement Study of Open Resolvers and DNS Server VersionA Measurement Study of Open Resolvers and DNS Server Version
A Measurement Study of Open Resolvers and DNS Server VersionYuuki Takano
 

More from Yuuki Takano (16)

アクターモデル
アクターモデルアクターモデル
アクターモデル
 
π計算
π計算π計算
π計算
 
FARIS: Fast and Memory-efficient URL Filter by Domain Specific Machine
FARIS: Fast and Memory-efficient URL Filter by Domain Specific MachineFARIS: Fast and Memory-efficient URL Filter by Domain Specific Machine
FARIS: Fast and Memory-efficient URL Filter by Domain Specific Machine
 
リアクティブプログラミング
リアクティブプログラミングリアクティブプログラミング
リアクティブプログラミング
 
Transactional Memory
Transactional MemoryTransactional Memory
Transactional Memory
 
Tutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorTutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow Abstractor
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
CUDAメモ
CUDAメモCUDAメモ
CUDAメモ
 
【やってみた】リーマン多様体へのグラフ描画アルゴリズムの実装【実装してみた】
【やってみた】リーマン多様体へのグラフ描画アルゴリズムの実装【実装してみた】【やってみた】リーマン多様体へのグラフ描画アルゴリズムの実装【実装してみた】
【やってみた】リーマン多様体へのグラフ描画アルゴリズムの実装【実装してみた】
 
SF-TAP: L7レベルネットワークトラフィック解析器
SF-TAP: L7レベルネットワークトラフィック解析器SF-TAP: L7レベルネットワークトラフィック解析器
SF-TAP: L7レベルネットワークトラフィック解析器
 
SF-TAP: 柔軟で規模追従可能なトラフィック解析基盤の設計
SF-TAP: 柔軟で規模追従可能なトラフィック解析基盤の設計SF-TAP: 柔軟で規模追従可能なトラフィック解析基盤の設計
SF-TAP: 柔軟で規模追従可能なトラフィック解析基盤の設計
 
Measurement Study of Open Resolvers and DNS Server Version
Measurement Study of Open Resolvers and DNS Server VersionMeasurement Study of Open Resolvers and DNS Server Version
Measurement Study of Open Resolvers and DNS Server Version
 
Security workshop 20131220
Security workshop 20131220Security workshop 20131220
Security workshop 20131220
 
Security workshop 20131213
Security workshop 20131213Security workshop 20131213
Security workshop 20131213
 
Security workshop 20131127
Security workshop 20131127Security workshop 20131127
Security workshop 20131127
 
A Measurement Study of Open Resolvers and DNS Server Version
A Measurement Study of Open Resolvers and DNS Server VersionA Measurement Study of Open Resolvers and DNS Server Version
A Measurement Study of Open Resolvers and DNS Server Version
 

Recently uploaded

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 

Recently uploaded (20)

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 

Visualize & Detect Third-Party Web Tracking

  • 1. IEEE, 12th Annual Conference on Privacy Security Trust, PST 2014 MindYourPrivacy: Design and Implementation of a Visualization System for Third-Party Web Tracking Yuuki Takano, Satoshi Ohta, Takeshi Takahashi, Ruo Ando, Tomoya Inoue 1
  • 2. Introduction ❖ The number of third-party Web tracking is growing each year.! ❖ online privacy is now significant issue! ❖ SNSs and targeted ads can associate real names of individuals with tracking information! ❖ Propose MindYourPrivacy to visualize and show third-party web tracking.! ❖ deep-packet-inspection based architecture! ❖ to support heterogeneous browsers and devices! ❖ Experimented MindYourPrivacy at the Workshop (WIDE Camp 2014 Autumn in JAPAN), which has 129 attendees.! ❖ reveal that clustering web graph helps to detect ads’ sites by analyzing user traffic! ❖ some graph theory features also help to heuristically detect ads sites 2
  • 3. Related Work Web Tracking Mechanism ❖ Third-party Web tracker typically tracks by cookie, Etags or flash storage web bug (1x1 pict) ads social widgets First-party Web servers Third-party Web tracker tracking id (cookie, Etags, flash storage, etc...) contents contents 3
  • 6. Related Work Web Tracking Detection Techniques ❖ ShareMeNot! ❖ swap a link to known data-collection sites such as Facebook! ❖ Roesner et al. “Detecting and defending against third-party tracking on the web”, USENIX NSDI 2012! ❖ Lightbeam! ❖ visualize web graph between first and third-party sites! ❖ https://www.mozilla.org/lightbeam/! ❖ AdBlock Plus! ❖ signature based ads detection and blocking! ❖ https://adblockplus.org/en/firefox 6
  • 7. Related Work Measurements ❖ Several researchers reported on third party web tracker.! ❖ One of the research reported third-party trackers within Alexa’s top 500 domains.! ❖ Roesner et al, “Detecting and defending against third-party tracking on the web”, USENIX NSDI 2012! e fact that the tracking t it is thus difficult to or policy solutions. s classification is ag- on of the mechanisms e storage may be done , and information may ker in any way. This anism makes the clas- evolution of specific by trackers. ework, we created a tomatically classifies rved on the client-side. Figure 6: Prevalence of Trackers on Top 500 Domains. Trackers are counted on domains, i.e., if a particular tracker appears on two pages of a domain, it is counted once. Top 20 Trackers on Alexa’s Top 500 Domains! [Roesner et al. NSDI 2012] 7
  • 8. MindYourPrivacy Design Principle ❖ We designed and implemented a visualization system for third-party web tracking called MindYourPrivacy.! ❖ To clearly show third-party web trackers to users.! ❖ Design Principles of MindYourPrivacy! ❖ Independence from browsers and devices! ❖ the existence of various OSes or devices such as Linux, Windows, MacOS, and smartphone OSes such as Android and iOS complicates the problem! ❖ adopt a deep-packet-inspection based approach to support heterogeneous browsers and devices! ❖ Accessibility and comprehensiveness of the analysis results! ❖ easy to access: MindYourPrivacy provides analysis results in the form of an HTML file via an HTTP server to facilitate users’ access to them! ❖ easy to understand: visualize trackers by tag cloud fashion, and provide web graph’s file further analysis 8
  • 9. Design and Implementation Web Tracker Identification Methodology (1) ❖ HTTP Referrer Web Graph Analysis! ❖ generate a web graph by using HTTP referrer tag! ❖ if an site is referred by many other sites, MindYourPrivacy assumes that it is a suspicious site tracking users! ❖ Domain Aggregation! ❖ to show users which organizations track them, MindYourPrivacy aggregates domains as either second or third level! ❖ platform.twitter.com and platform0.twitter.com are aggregated to twitter.com 9
  • 10. Design and Implementation Web Tracker Identification Methodology (2) ❖ DNS-SOA-Record-Based Grouping! ❖ aggregate domains by DNS SOA record! ❖ facebook.com and facebook.net are aggregated into dns.facebook.com, which is their DNS SOA record! ❖ Balanchander et al., “Privacy diffusion on the web: a longitudinal perspective”, WWW 2009! ❖ Weighted site Ranking of User Data Leakage! ❖ MindYourPrivacy shows not only web trackers but also leaking sites to trackers! ❖ leaking sites are scored, but the details are omitted here. see our paper 10
  • 11. Design and Implementation System Model ❖ MindYourPrivacy captures traffic of users’ web access! ❖ show analyzed results via MindYourPrivacy’s web server! ❖ users need not install or configure specific applications MindYourPrivacy The Internet Traffic Capture Web Access Analyzed Result via HTTP Outgoing Traffic Router・・・ Users 11
  • 12. Design and Implementation Implementation Architecture ❖ Catenaccio DPI! ❖ capture traffic from network IF! ❖ reconstruct TCP stream and store captured data into NoSQL DB! ❖ written in C++! ❖ NoSQL DB! ❖ use MongoDB as a database! ❖ Tracking Analyzer! ❖ analyze measurement data! ❖ written in JavaScript and Python! ❖ HTML/Graph File Generator! ❖ generate visualized results! ❖ written in Python! ❖ HTML Server! ❖ serve HTML/Graph files to users Catenaccio DPI NoSQL DB Tracking Analyzer HTML/Graph File Generator HTML Server NW/IF L2 Datagram Measurement Data Analyzed Result Measurement Data HTML/Graph Files Analyzing Result 12
  • 13. Design and Implementation Web User Interface ❖ visualize suspicious web trackers as tag cloud fashion! ❖ domains are grouped by DNS SOA records! ❖ referring sites are shown in right pane
  • 14. Experiment at WIDE Camp 2013 Autumn ❖ We experimented MindYourPrivacy at WIDE camp 2013 autumn.! ❖ WIDE Camp 2013 Autumn (Sep. 10 - Sep. 13)! ❖ a workshop for Internet researchers, operators and developers! ❖ 129 attendees, most of whom are either IT specialists or students majoring IT! ❖ the experiment is agreed by every attendees (for only research purpose)! ❖ We captured the attendees’ web browsing traffic and analyzed. 14
  • 15. Experiment User Traffic Analysis (1) ❖ Obtained 734,194 HTTP requests and 1,661 individual source IP addresses (IPv4 and IPv6).! ❖ A directed web graph is generated by using HTTP referrer header.! ❖ There are 3,966 nodes and 12,941 edges.! ❖ Analyze this web graph to find web trackers. 15
  • 16. Experiment User Traffic Analysis (2) ❖ To find web trackers, we extract top most-referred sites from the web graph! ❖ Advertisements and social sites, which tend to track users, have many incoming links. ttendees Total 117 12 129 RLs are only TABLE II: Top-five Most-referred Sites Site # of incoming links google-analytics.com 847 facebook.com 437 twitter.com 393 doubleclick.net 380 google.com 356 16 Top-Five Most-referred Sites
  • 17. Experiment User Traffic Analysis (3) ❖ We then adopted a clustering technique (M-CODE) to the web graph.! ❖ As a result of clustering, many ad-sites are found in cluster. referred Graph Pane: This pane provides referred .dot and .sif formats. Users can download these re and analyze or visualize the referred graph by viz, Cytoscape, etc. Figures 5 and Figure 6 show examples using Cytoscape. Through this sort of users can easily find to which sites many other IV. Experiment strate the usability and effectiveness of the pro- m, we conducted an experiment at WIDE camp September 10–13 2013. E project [19] is a research and development apan aimed at developing a widely integrated nvironment. It organizes camps every spring and many researchers, developers, and students tak- discussing Internet technologies. Table I lists the f the camp attendees. There were 129 attendees, m are either IT specialists or students majoring in conducted two types of experiments: user traffic questionnaire-based use analysis. whose values are random text strings, the number of coo values we observed, and examples. In total we obser 2,309 and 2,671 requests for platform.twitter.com www.facebook.com, respectively. However, we found o about 100 unique values for each cookie, though fr www.facebook.com is 397. fr thus does not seem to tracking cookies, and the 100 likely indicates the numbe attendees (which was also around 100) or devices. The res reveal that tracking cookies can also be used for per-u analysis and visualization. We then applied MCODE clustering [20] to the graph Figure 5 to find further features. This allowed us to obse many ad sites clustered into the rank 1 cluster by MCO The following domains were ad sites found in the ran cluster of Figure 6: doubleclick.net, amazon-adsystem.com, googleadservices.com, i-mobile.co.jp, advg.jp, adingo.jp, iogous.com, admeld.com, criteo.com. Ad sites generally tend to collect user information for busin purposes. We therefore should be concerned with the priv issues they present. This discovery should help further anal and visualization concerning such sites. Table IV lists feature vector of ads and other sites that appeared in Figur ad-sites in cluster 17
  • 18. Experiment User Traffic Analysis (4) ❖ We analyzed the cluster from the aspect of graph theory’s feature.! ❖ As a result of that, we found that ad-sites’ #incoming links, #outgoing links and neighborhood connectivity are quite different from others.! ❖ ad-sites have many incoming links, but few outgoing links! ❖ ad-sites’ neighborhood connectivity is relatively low 18 Fig. 6: Rank 1 Cluster by MCODE (include loops = false, degree cutoff = 2, haircut = true, fluff = false, node score cutoff = 0.2, k-core = 2, and max. depth = 100) TABLE IV: Feature Vector of Rank 1 Cluster’s Edge (Average and Unbiased Variance) #incoming links # of outgoing links Neighborhood connectivity avg. var. avg. var. avg. var. ad sites 90.2 12405.4 15.2 3972.9 46.0 3972.9 others 30.2 3972.9 29.7 569.3 130.2 5212.0 measures, and the most popular measure is to use multiple browsers. Although multiple browser usage does not strictly the DNT flag i tracking; it is ju referrers or coo online usability not use SNSs. of infrastructur pros and cons o The free-form • Use privat • Delete HT • Use AdBlo • Absolutely Modern Web b mode to isolat responded that Some of them for not disablin Some attendee blocks online a leakage throug attendees answ tracking. Such privacy are qui Question 3: D after seeing the
  • 19. Experiment User Traffic Analysis (5) ❖ Do Not Track flag is used to announce a wish of users to third-party trackers.! ❖ However only 40,650 (40,605/734,194 = 6 %) DNT enabled requests are observed. 19
  • 20. Conclusion and Future Work ❖ Proposed a visualization system for third-party web tracking called MindYourPrivacy.! ❖ browser and device independent architecture! ❖ visualize web trackers as tag cloud fashion! ❖ Experimented MindYourPrivacy at WIDE camp 2013 autumn and analyze users’ web browsing traffic.! ❖ generate web graph by HTTP referrer and analyze it! ❖ revealed that graph clustering and some graph theory’s features are useful to find web trackers! ❖ Adopting more sophisticated approaches we revealed at the experiment, and signature based approach is a future work. 20