Predicting repeat visitors to website without using cookies.
Ideal for weeding out identified frauds from re-entering into the system.
A client device when connecting with the web server where the website is hosted goes through several
handshakes and ends up sending network and application data from which device geolocation, device
network connection and device browser, operating system and hardware data details can be gathered and
interpreted by the application server to uniquely identify a client
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Browser fingerprinting without cookies
1. Contents
NPD: Repeat/ Unique Visitor Identification............................................................................................2
1. Product Intro & Goal...................................................................................................................2
2. Who’s it for?................................................................................................................................2
3. Why Build It.................................................................................................................................2
4. Desired Output of the product ...................................................................................................2
5. Preconditions ..............................................................................................................................2
6. Background & Short Description.................................................................................................2
7. Success Scenario for the product................................................................................................3
Implementation Methodology............................................................................................................4
8. Attributes for Fingerprinting.......................................................................................................4
8.1 User Agent String (HTTP Header)........................................................................................4
8.2 HTTP Requests header........................................................................................................4
8.3 Javascript Display Data........................................................................................................4
8.4 Plugin Data..........................................................................................................................4
8.5 HTML Canvas fingerprint.....................................................................................................5
8.6 WebGL Rendering ...............................................................................................................5
8.7 System Fonts.......................................................................................................................6
8.8 Do Not Track Request .........................................................................................................6
8.9 DNS/ TCP (network) ............................................................................................................6
8.10 Timezone.............................................................................................................................6
9. Scenarios for the model..............................................................................................................6
10. Observations with different devices, browsers/OS & Network..............................................7
10.1 With a given browser and OS but different network connections.....................................7
Connecting via Home Wi-Fi, company Wi-Fi, dongle, ISP shows the following behaviour in the
attributes.........................................................................................................................................7
10.2 Same device but different browsers/ OS............................................................................8
11. Estimating weights for each attribute for the model .........................................................8
12. Suggested Methodology .........................................................................................................9
13. Conclusion.............................................................................................................................10
14. Corner Case...........................................................................................................................10
15. Experimental.........................................................................................................................10
16. References for further reading .............................................................................................10
2. NPD: Repeat/ Unique Visitor Identification
1. Product Intro & Goal
A web based solution to detect if a given user visiting your website has visited it before or not.
2. Who’s it for?
Partner NBFCs with whom prospective borrowers fill up loan application
3. Why Build It
This solution aims to detect in the pre-login journey itself that whether a visitor is a unique visitor or a repeat
visitor. Usage can be in loan application fraud, and in ad servers for identifying users uniquely even after they
flush their cookies.
4. Desired Output of the product
Confidence score of a visitor being a re-visitor
List of previous visits with time-stamp for a repeat visitor
5. Preconditions
Cookies not allowed, outside compliance
6. Background & Short Description
A client device when connecting with the web server where the website is hosted goes through several
handshakes and ends up sending network and application data from which device geolocation, device
network connection and device browser, operating system and hardware data details can be gathered and
interpreted by the application server to uniquely identify a client (device).
Client- server connection flow
Load Balancer will direct HTTP/HTTPS requests to different server instances. It may or may not be content
aware. During the negotiation between originating browser and the hosting server several REQUEST headers
will be passed for appropriate server response for content. Network metadata information can be processed
at network level or passed as a connection attribute to hosting server to be processed along with html/css &
JavaScript data.
3. 7. Success Scenario for the product
a. The product should predict with a high confidence score that whether a user is new visitor or
a returning user (H0: each user is a new visitor)
Minimum Type I & Type II errors
b. Min False positive : user was a new visitor but system predicted it a repeat visitor is a failure
c. Min False negative: user was not a new visitor but system predicted it a new visitor
d. Minimum test speed
---------------------------------------------------------------------------------------------------------------------------------------------------
4. Implementation Methodology
8. Attributes for Fingerprinting
8.1 User Agent String (HTTP Header)
Identifies information regarding browser & operating system
A typical user-agent string looks like:
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
8.2 HTTP Requests header
Accept request-header fields
Cookie device=d7834267-37fd-42ac-aa8c-1373aeebcf92; JSESSIONID=...
Host noc.to
Accept text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
If-None-Match "d7834267-37fd-42ac-aa8c-1373aeebcf92/2017-01-13-09:43:54.513"
Upgrade-Insecure-Requests 1
Accept-Language en-US,en;q=0.8
User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/53
Connection keep-alive
Accept-Encoding gzip, deflate, sdch
8.3 Javascript Display Data
If browser can run JavaScript (by default Yes)
Screen W x H
1366 x 768
Available W x H 1366 x 728
Color Depth 24
Pixel Ratio 1
8.4 Plugin Data
Flash Plugin Data
Plugin Version WIN 24,0,0,194
Plugin Manufacturer Google Pepper
Language en
Operating System Windows 10
CPU Architecture x86
Supports 32-Bit Processes yes
Supports 64-Bit Processes yes
Screen Resolution 1366 x 768
LSO Storage Test passed
Pixel Aspect Ratio 1
Screen DPI 72
AV Hardware Disabled no
File Read Disabled no
Has Printing yes
Has Accessibility yes
Has Audio yes
5. Has MP3 yes
Has Embedded Video yes
Has Screen Broadcast no
Has Screen Playback no
Has Streaming Audio yes
Has Streaming Video yes
Has Audio Encoder yes
Has Video Encoder yes
Has Input Method Editor yes
Max Level IDC 5.1
Player Type PlugIn
Is Debugger no
Has Transport Layer Security yes
Navigator Plugin List
Name Chrome PDF Viewer
Filename mhjfbmdgcfjbbpaeojofohoefgiehjai
Name Chrome PDF Viewer
Description Portable Document Format
Filename internal-pdf-viewer
Name Native Client
Filename internal-nacl-plugin
Name Shockwave Flash
Description Shockwave Flash 24.0 r0
Filename pepflashplayer.dll
Name Widevine Content Decryption Module
Description Enables Widevine licenses for playback of HTML audio/video content. (version: 1.4.8.903)
Filename widevinecdmadapter.dll
Other variables:
8.5 HTML Canvas fingerprint
With HTML5 canvas API, text/ image is rendered differently in different devices with varying OS, font library,
graphics card, graphics driver and browser version. For e.g. the pixelmap produced from client running Chrome
v55, Win 10, x64 with "ANGLE (Intel(R) HD Graphics 5500 Direct3D11 vs_5_0 ps_5_0)"
8.6 WebGL Rendering
Browser supporting WebGL & WebGL 2.0 gives a unique hashmap to an extent depending on GPU context
6. 8.7 System Fonts
Extracted via Flash and JavaScript, can give a unique fingerprint in some machines apart from browser specific
webfonts capability.
8.8 Do Not Track Request
JavaScript populated Do Not Track tag
8.9 DNS/ TCP (network)
TCP packet sent by client when negotiating connection have different values set by different OS types and
versions. TTL in IP header and TCP window size for eg are different for different OS types
In case of User Agent string spoofing, it can be cross verified with network data which is harder and less
frequently spoofed. DNS Data can be used like DNS version to fingerprint a user to a particular DNS.
8.10 Timezone
Client timezone and timestamp can be obtained by executing JS code on browser
9. Scenarios for the model
The model aims be able to identify users who access the website that whether they are first time or repeat
users. Since IP is dynamic, and in cases like corporate offices/ business parks, it gets shared across LAN with
one gateway to WAN, it is difficult to narrow it down with IP fingerprint only. With proxies/ VPNs being quite
prevalent it is no longer a good idea to use IP address fingerprinting only.
Quite simply we want to detect whether:
a. A browser is a returning browser or not. If yes, then we would want to narrow it down to the
device details using other fingerprinting methods.
b. If it’s not a returning browser but a new browser, even then we would want to cross-match with
device fingerprinting using canvas/GPU data etc.
The rarer the browser or device OS/ device hardware the easier it is to uniquely track a visitor.
7. Decision Scenarios
User Device OS Browser Model Scope
Same
User
device 1 Same OS Different Browsers –
edge/IE/tor/firefox/chrome
etc
Should detect
Same device 1 Different OS –
Win/OSX/Android/
Linux
Should detect
(rarer device is
better)
New device 2 Same OS as row 1
(exact match)
Same browser as row 1
(exact match)
Cannot detect
the user since
device is new
Different
User
Device 1 Same OS from
row 1 above
different browser from row
1
Should detect
(rarer the
browser/OS the
better)
Device 1 different OS from
row 1 above
Any browser Yes
Hence the success of the algorithm will be in uniquely identifying a device from different variables like user
agent and flash, canvas data etc. Repeat user visit here should mean repeat access by the same client device
using same or different browser and give the previous timestamps of each visit
10. Observations with different devices, browsers/OS & Network
On testing with https://panopticlick.eff.org; www.amiunique.org; https://browserleaks.com and
www.letmetrackyou.org following behaviour of attributes was seen.
10.1 With a given browser and OS but different network connections
Connecting via Home Wi-Fi, company Wi-Fi, dongle, ISP shows the following behaviour in the attributes
No. Attribute Behaviour
1 Accept header data Doesn’t change
2 User-Agent Doesn’t change, unless updated
3 DNT Doesn’t change
4 Touch Support Doesn’t change
5 Platform Doesn’t change
6 Language Doesn’t change
7 Cookies enabled Doesn’t change
8 Screen resolution Doesn’t change
9 Timezone Doesn’t change
10 Plugin versions Doesn’t change, unless updated or removed
11 Font List (all) Order does not change
12 Canvas Hash Doesn’t change
13 WebGL Hash Doesn’t change
8. From this we can conclude that network has the least impact on each of these variables
10.2 Same device but different browsers/ OS
Using Edge, IE, chrome, Firefox, Android, Win, the below attributes show the following behaviour
No. Attribute Behaviour
1 Accept header changes
2 User-Agent changes
3 DNT changes
4 Touch Support Doesn’t change
5 Platform Doesn’t change
6 Language Doesn’t change
7 Cookies enabled Doesn’t change
8 Screen resolution Doesn’t change
9 Timezone Doesn’t change
10 Plugin version changes
11 Font List (all) Changes
12 Canvas Hash changes
13 WebGL Hash changes
11. Estimating weights for each attribute for the model
From the observations from https://panopticlick.eff.org results it can be inferred that lower the probability of
a particular attribute, the rarer it is to be found on the Internet, and hence higher the chances to uniquely
identify the visitor and bigger the confidence score
No. Attribute Weight
1 Accept header data High
2 User-Agent High
3 DNT Low
4 Touch Support Low
5 Platform Low
6 Language Low
7 Cookies enabled Low
8 Screen resolution Medium
9 Timezone Low
10 Plugin version High
11 Font List (all) High
12 Canvas Hash High
13 WebGL Hash High
9. 12. Suggested Methodology
Each visitor’s network and application data attributes will be lifted out of the browser request headers and by
running JS code on the browser. Each of these attributes can be fingerprinted and their probabilities be
computed. Further, canvas fingerprinting is possible by rendering a simple text on client machine from canvas
API and extracting the resulting pixelmap. For WebGL fingerprinting the we can acquire client’s graphic card to
draw & then extract images will give us a unique fingerprint which can then be stored against the detected
hardware data obtained from flash for further uses.
The combined fingerprints of each of these attributes will result in a one single user fingerprint which can then
be statistically validated for a unique print with given significance level. (P-value)
With enough test data, we can derive the coefficients of all the above 13 variables and error constant in the
below equation. With time the model can better itself. The final output will be the test statistic which can be
compared with the mean and standard deviation to arrive at the confidence internal.
( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
( )
Mean of the above equation can be calculated from
Knowing the y output, same mean and standard deviation, z-value can be calculated and confidence interval
can be calculated within 1 to 3 standard deviations.
For example,
Samples above the 2 standard deviation can be treated as new visitors and those below the 95% significance
be treated as repeat users and timestamps be shown of previous visits.
Case:
Same device, different browser. Assigning weights from 10.2 and lifting probabilities for the variables from
https://panopticlick.eff.org
No. Attribute Weights Percentages Edge
(1/Prob)
IE
(1/Prob)
Chrome
(1/Prob)
1 Accept header data High 11% 31462.83 31462.83 8.41
2 User-Agent High 13% 1716.15 522.93 35.17
3 DNT Low 4% 2.34 2.34 2.34
4 Touch Support Low 3% 1.41 1.41 1.41
5 Platform Low 5% 2.44 2.44 2.44
6 Language Low 4% 5243.81 5243.81 2.01
7 Cookies enabled Low 4% 1.13 1.13 1.13
8 Screen resolution &
color
Medium 8% 8.37 8.37 8.37
10. 9 Timezone Low 3% 92.27 92.27 92.27
10 Browser Plugin
version
High 11% 3.17 188777 13.5
11 Font List (all) High 12% 4016.53 188777 188775
12 High 11% 20975.22 1165.29 319.42
13 WebGL Hash High 11% 17161.55 293.13 21.19
Using above eq
0.116112455 0.078871 0.130456
As per panopticlick, Edge & IE had the same entropy while Chrome had slightly higher entropy. From the above
we can see that output for chrome is higher than the edge & IE (output of these two should have been similar).
But without test data, the weights of all the above attributes are not validated.
Multicollinearity: some of the variables show high degree of correlation with each other and may not be
suitable for regression. (Canvas and WebGL hash; UA and Accept header show high correlation,
unsubstantiated with a test sample size). Also it is assumed to be a linear regression for simplification
purposes.
13. Conclusion
Online financial services tend to reduce the information required from clients during on-boarding. They
simplify the process to what is necessary for complying with regulations, and thus open the door to abuses and
fraud. Moreover borrowers tend to fill multiple loan application forms, with same and different lenders, just in
case they get rejected by one.
Hence preventing loan application & credit card fraud is the biggest use case for such a model apart from
allowing ad networks and DSPs to track visitors uniquely in case users have flushed cookies or accessing
incognito. With network attributes like DNS and TCP we have also cross-verify if the user has manipulated their
browser’s user agent string.
14. Corner Case
a. Overtime a user’s fingerprint may change due to OS/browser upgrades. Since these values won’t
deprecate, provisions can be made in the algorithm.
b. Two devices with exactly same hardware, OS and browser versions will always give same fingerprint
and may not be able to be differentiated by any method. In such instances we have to think of some
other attributes.
15. Experimental
a. MAC collection: Some languages provide remote server with the client mac address but then it is
limited to PHP as of now which can run the script on client end.
b. Keyboard fingerprinting: user’s typing speed and words usage can also be used in case other methods
fail to determine conclusively.
c. Audio fingerprinting: new HTML5 Audio Context API can be used fingerprint website visitors
d. Battery fingerprinting: new HTML5 API will allow browser see how much battery life in percentages is
left in the device
16. References for further reading
1. https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy
2. https://zyan.scripts.mit.edu/presentations/toorcon2015.pdf