SlideShare a Scribd company logo
1 of 7

Abstract—The web is a worldwide collections of system
providing a variety of information and communication
across the web. The internet as we know it today is an
outstanding success more than1,300,000,000 billion users
are connected to the globe. In as much as users tend to
surf the web for resources, it is of great importance to
know that the web is the most popular internet service.
However, this paper focuses on various ways of monitoring
and tracking of users while surfing the web as well as
current methods used by websites to track users. This
paper further went on to enumerate how users can protect
themselves from being tracked as well as highlight the
importance of privacy.
keywords: Cookies, Tracking devices, Authentication
protocols, Server and proxy logs, Eavesdropping,
Scripting.
I. Introduction
A great way to capture the insight about users and customers
who visit the website, is to be able to track their area of
interest during web surfing. In other words, this paper focuses
on certain ways a user can be tracked during web surfing using
certain internet tools like Cookies, web bugs, server logs and
log files, JavaScript's in Java run time environments.
A particular tool cannot really identify user while surfing the
web, but the combination of a number of tools could be used
coherently in other to identify a particular user's private details
like name, age, address, email, location and frequently visited
websites as addressed in this paper. However, privacy
protection options available to Internet users are also
addressed. These information received would enable the
company improve on services rendered to users, help top
strategic managers in inventing new business strategies,
objectives as well as meeting goals.
II. Methodology
A step by step approach will be used in tracking of users and
identification of users Personal Information Identification[PII]
based on the following tools; Cookies, Scripts, servers and
proxy logs files, as related to this paper starting from when a
request is triggered from the clients browser to the server as
well as the activities that occurs during the processes.
Fig: 1 Physical Representation.
The above Fig 1 illustrates how a user visits a site.
When a new party is introduced into the Wi-Fi network as the
sample user.
Running tools which perform access-point spoofing in
addition to packet sniffing software, all data such as scripts,
cookies, server logs which are sent between the client and the
HTTP server located on the internet can be read by the
attacker.
The figure below illustrates the set-up used to eavesdrop the
connection between the client and the internet server.
Web Security
Atsegwasi Otsemhuno Rogers
RA956@live.mdx.ac.uk. M00478276
The access-point spoofing tool tricks the router into using a
compromised computer as an access point to the HTTP server
located on the internet.
Logs of internet activity between the internet server and the
internet pass through this compromised access point. As the
access point is located on the same wi-fi connection, the
location of the client computer can be easily traced down to a
10m radius.
Packet-sniffing tools which are run on the compromised
system capture cookie data, session data, browser-agent data,
client IP address,target IP address,usernames, passwords and
many other information invisibly.
Scripts or cookies that are being transferred can be modified to
add malicious content which would harm the client’s
computer.
Besides, sensitive information such as user-names and
passwords, if not encrypted can be seen in plain text format.
These data can be used to impersonate the client on the
internet.
III. Related Work
The advancement of the web has ushered in the philosophy of
non-obtrusive use of the web. Besides, this advancement has
made data a valuable asset for the web. In order to enhance
user experience, provide a personalized feel and improve
services for users on the web, this data proves invaluable [14].
Providing personalized services to users makes them feel
special and hence makes them loyal users of the web service.
Several researchers have experimented with ways through
which data on the web can be used to personally identify users
and hence give them web results based on their unique
identity[5][13][16][17][18]. This is termed as ‘user
profiling’[16]. The tools available to internet platforms to
achieve this include cookies, browser cache, proxy servers,
browser agents, web logs, search logs. Data accumulated from
these sources enable internet platforms to understand their
users better.
Research by Xia and Brustoloni was conducted into the extent
of the personal information disclosed on the internet [10]. In
the case study of sample users, they discovered that over 90%
of users had submitted public information about their real
name, pictures, email address, date of birth, relationship status
and interests. While just 10% of sample users disclosed
information about their physical address, this can easily be
gotten from geo-location tools.
This information can be used for positive benefits[13][17].
However, when it falls into the hands of the wrong people, it
could be used for criminal and harmful purposes. One of the
techniques used is eavesdropping[20]. Using airsnarf, an
access point tool with integrated DHCP, DNS, and HTTP
spoofing. The tool enables attackers to re-associate a client
computers to a rogue access point. This is done by amplifying
the rogue signal over the legitimate access point’s signal (with
the aid of antennas). Packets between the client and the server
are easily intercepted and hold important information [13].
Besides personal information, physical address can be
determined from this[17].
Mobasher, Cooley and Srivastava proposed the mathematical
formula used to process the vast amount of data gathered from
a web user. They stated that a database of user profiles is
represented as UP = [suk (ij)]m x n. Where UP represents the
universal set of user profiles, suk (ij) represents the degree of
interest in an item (represented by ij) by userrepresented by uk.
They stated that this is algorithm is used by analytic firms to
provide personalized products or web experience[14].
Data is collected through explicit feedback or implicit
feedback[16]. Implicit feedback is un-obtrusive and uses
cookies, browser cache, proxy servers and the other tools to
gather information about user habits and profile. Explicit
feedback requires users to knowingly submit information
regarding their personalities to the website. Explicit feedback
uses selection tools, surveys, feedback forms to gather
information regarding users’ persona, habits and preferences.
Both modes of feedback have their advantages and
disadvantages[16].
In their research, Quiroga and Mostafa aimed to compare the
strength of each feedback in a test whereby a feedback mode
will provide the file requested by a specific user based on his
or her profile [16].
Taking 18 users, each user had to use a system containing a
record of 6,000 health records each categorized into 15
different areas. Each user had to use all 15 categories.
For explicit feedback, the user was presented with a form
which collect’s the user’s chosen preference based on
suggestions given. This user’s inputted data is used as his or
her profile and used to provide automatic suggestions of
documents needed by the user.
On the other hand, for implicit feedback, the usage activity on
the system and viewed documents are automatically logged
and used as the user’s profile. As done in the explicit
feedback, the user’s profile is used to provide suggestions of
the health records which the sample users would be interested
in.
Finally, the feedback from both implicit and explicit feedback
were compared to the accuracy of a systemwhich made use of
both the implicit and explicit feedback methods.
Results at the end of the day showed that while the accuracy
of both the explicit and implicit feedback were almost similar,
the result from the system which made use of both feedback
methods was far greater [16].
The end result highlighted how the use of automated systems
can help provide a better outlook of the user when combined
with the explicit feedback.
In research by Teevan et al., data was added to a client-
profiling agent which included information such as browsing
history, documents stored on the computer and email history.
They found out that the more information available to the
client-profiling agent, the better the profile performed in
providing results which matched the user’s intentions. In
addition, it was found that results from data created through a
profile out-performed that from a non-personalized search[18].
Results from several other researchers back up the usefulness
of the data collected by user-tracking tools.
IV. Literature Review
Cookies:- When a uservisits a website, cookies are sent to the
clients browser to uniquely identify the users browser. [2] It is
sent together with the request made by the user. Cookies are
small text files embedded in the browser of a computer when a
person visits a website[2].
Cookies usually contain a serial number which uniquely
identifies a user. Once they are put on a user's computer, they
track the user's activity on the website, and send these
information back to the website owners [6]. Whenever a user
returns to the website, the web server uses the unique
identifier to retrieve the user's record from their database.
There are two classes of cookies, session cookies and
persistent cookies. Cookies generally are used for session
handling, authentication, identification of clients and storage
of site preferences.
Normal cookies (Session Cookies) are data saved by a
website onto the user's computer during a visit to the site.
Session cookies are those which are reside on a client's
computer for the duration of his/her browsing session[12].
When the browser is closed, these cookies get deleted
automatically. They do not store any personal information.
They are used by commercial websites and are mostly
employed for shopping cart functionality[9].
On the other hand, Tracking cookies (Persistent Cookies)
are a specialized type of cookie that can be shared by multiple
websites. Persistent cookies are stored on a user's browser,
even when the browsing session ends[12]. These type of
cookies are used to identify individual users and also used by
website owners to analyze user surfing habits on their
websites[17]. Cookies keep track of which advertisements the
user has already seen on the site but personal information is
not generated by the cookies but by your own input into the
website through order forms, registration pages, payment
pages as well as other online forms[10][12].
Flash cookies are stored by the Adobe Flash plugin. These
cookies usually back up data from regular cookies. If a user
deletes regular cookies, flash cookies still keep the data. A
website that placed a cookie on a client's computer can still
recognize the user even when the cookie is deleted, as long as
it is backed up in a Flash cookie[3].
Cookies have several benefits to web users. They remember
personal information (such as name, address, payment
information, emails and many others), so one does not need to
refill website forms or perform the same tasks over and over
again[10]. Cookies remain on a user's computer for a long
time, thereby making the accessibility of a website much
easier for the client[10]
However, depending on a website's policy, the data collected
by cookies may be sold to third-parties such as marketing
firms, advertisers, junk mailers[13].
Cookies can be a very powerful tool to track a particular user.
Due to the fact that cookies help remember the online
footprint of a returning user, by obtaining the client's computer
IP address via geo-location tools (such as JavaScript, Google
Maps API), the physical location of a user can be matched
with his/her online profile or footprint.
Scripts:- Code that runs in a web browser is one of the most
powerful tools used to track user activity online. These scripts
are based on JavaScript. They can either be client-side
JavaScript or server-side JavaScript implementations which
are translated to browser-readable language or scripts.
JavaScript was created in 1995 to allow the browser to
become more interactive. Since then, it has become a language
used in network programming, game development and the
creation of mobile and desktop applications[4].
Marketers add JavaScript to their websites through the source
code or the template of their web pages to collect visitor data.
When visitors visit their web pages for the first time, a
JavaScript is sent which generates user browser data for
storage on the client computer.
The collected data is stored for a long time, and so when the
user returns, the data stored can identify returning users. For
storing such data, JavaScript is aided by cookies[15].
Getting to the server-side JavaScript implementations, these
codes are processed by the server and sent to the browser for
further processing. These codes collect information on
location data, social network activity, music selections, movie
preferences and user behaviour.
This data is sent back to the marketers or social networks,
stored, analyzed and used for various uses[15].
Due to the vast amount of data which can be collected by
JavaScript, privacy activists or organizations fight to curtail
the excessive use of JavaScript for tracking users online[13].
Advantages of using JavaScript for user tracking include the
fact it is mostly not obstructive of the user experience on the
websites. In addition, it mostly enhances the user experience
on the website by sometimes tailoring products meant for the
user profile. It sometimes also provide useful suggestionsas to
related products or information which a user might need[4].
In addition, as browsers become faster and more stable,
JavaScript codes can run very fast in the background while the
user undertakes his or her web activities[1][4].
Furthermore, the user still has control over the use of these
scripts. By customizing browser JavaScript preferences or by
installing tools such as NoScript, the user can block scripts not
needed[15][17][20].
There are however several disadvantages of scripts. These
scripts provide an opportunity for hackers and malicious script
authors to run scripts on a client's computer which could be
potentially harmful. However, browser vendors are aiming to
restrict this problem by running scripts in a sandbox and
restricting sites to a same-origin policy[1].
Social Networks Tracking: Comprises of both On line social
networks (OSN) and Mobile On line social networks are
related to social-based services such as Facebook, My Space,
Twitter, Instagram and much more. With the help of these
immerse social based services, individuals have been able to
share some of their personal information with a couple of
entries, such as companies, events, public places and current
locations with the use of Geo location API, Geo latitude API
and check_In plug-in[15][17].
Furthermore, certain browsers tend to support these functions,
such as Chrome, Internet Explorer, Firefox, Safari and Opera.
But Geo location is much more accurate for devices with GPS
compactability like smart phones, Geo location and Geo
latitude makes use of web scripting such as JavaScript, Html,
CSS, and PHP to operate effectively as they are associated
with Google Map in determining a user's position during
tracking[13][17].
Keyloggers: As the name suggests, key-loggers log keys
inputted into users' computers. Key-loggers can either be in
software form or in the form of a device[11].
Legitimate programs use key-logging functionality to capture
hotkeys and provide additional functionality to the user. On
the other hand illegitimate use of key-loggers can be achieved.
These run invisibly while recording any single keystroke
made. They also log browsing activity, applications used and
screenshots of the computer[11].
Key-loggers are sometimes installed as spyware via bugged or
cracked software. The user believes he is installing a
legitimate software but in-avertedly, the key-logger installs
separately and silently.
However, legitimate key-loggers are installed along with
bundled software to enhance user experience.
Advantages of using key loggers to track users is that data
collected from the client's system can be used to improve
products and services[11][15]. Such products and services
include auto-complete or spell-check features
Server Logs: A server log file is one of the tools employed by
websites to track the activities of their users online or on the
website[7]. The log file is a file (or sometimes, several files)
created by the server and consists of all activities/requests
performed by the user.
Whenever a user visits a web site, the web server
automatically collects information on the new user. Typical
server log information include IP address of the client, referrer
link, operating system version, user agent, page requested,
time/date of client request[7].
A server log file is one of the tools employed by websites to
track the activities of their users online or on the website. The
log file is a file (or sometimes, several files) created by the
server and consists of all activities/requests performed by the
user.
Whenever a user visits a web site, the web server
automatically collects information on the new user. Typical
server log information include IP address of the client, referrer
link, operating system version, user agent, page requested,
time/date of client request.
A typical server log file looks like the image below:
Figure: A typical server log[7]
While server logs do not typically collect information on
specific users, information provided can be used for tracking a
user's browsing activity or pattern. It is important to note that
users have no control whatsoever over the data collected by
web servers. This raises ethical and legal questions over data
mining[13]. Server logs are accessible only to the web
administrator or to the webmaster. Information gathered from
server logs are used typically used to analyze web traffic
patterns, URL referrers or user agents.
Data logging by servers have both advantages and
disadvantages.
One of the advantages provided by server logs is that it aids
resolving issues. In a scenario whereby a customer has
problems using an e-commerce website, with a little
information provided by the client, technical problems can be
resolved due to the abundant information provided by the
server log(s)[7].
Besides, marketers utilize the log file to monitor trends and
tailor products and services to meet demand of end-users. In
addition, information provided by the server logs ensures that
webmasters and system administrators can fix loopholes and
better ensure optimum connectivity for their web clients[6][7].
On the other hand, the lack of transparency in terms of what
data is logged can raise privacy and subsequently legal issues.
Users have no control over how companies process
information obtained through server logs[13]. While a Terms
of Service document may provide answers to this, there are
situations whereby there is no such document[13].
Furthermore, since users have no control over how
organizations process data logs, user tracking information
such as IP addresses or page requests can be sold to third
parties without the end-client's consent. Data collected from
server logs can easily be stored into a database for further
analysis or usage[13].
Server log analytics software include Google Analytics, Deep
Log Analyzer, AWStats, Piwik, Webalizer etc. These software
provide several indicators to marketers or website owners[7].
Scenario: An online growing market place with about 3000
users seen as one of the most popular leading companies that
provides internet services for users to earn an income from
home using a PC and an internet connection.
IV. Proposed Method
The proposed method involves steps in obtain user tracking
information. Information such as username, passwords, cookie
data, browsing activity and IP addresses will be gathered and
stored in log files. The unsuspecting is attacked without any
knowledge. While browsing on the internet, he may fail to
discover that the certificates sent to his computer are spoofed
certificates self-signed by the attacker. However, anti-virus
software and modern browsers could trigger warnings to the
unsuspecting user.
If the user fails to heed warnings and continues browsing, all
data between the user and the internet passes through the
attacker's computer. All of this is done without any suspicious
warnings given to the user.
Fig: 2 Logical Representation:
Our sample user is called Benedict, a student of Middlesex
University. His web surfing session begins when he starts
surfing and ends when he quits the browser. Our test subject is
about to browse on Facebook, check his emails on Yahoo and
see course materials on Unihub.
Figure __: Benedict has to provide Google with information such as his name,
email address, mobile phone number, date of birth and gender.
Benedict provides his username and password on all three
websites.
However, on another computer, the attacker Ivan uses Cain
and Abel to scan the wireless networks for computers
available for exploitation. On Cain and Abel, Ivan resolves
host-names of the captured systems and sees Benedict's
computer online.
He begins his man-in-the-middle attack by selecting to
capture data from the wireless router and Benedict's system.
This mode of attack tricks Benedict's computer to think Ivan's
computer is the router. The router also thinks Ivan's computer
is Benedict's computer. He thereafter starts poisoning traffic
between the router and Benedict.
When Benedict clicks the log-in button with the username and
password, this data is captured in plain text format and can be
seen by Ivan on his computer
A cookie is inserted into Benedict's computer upon login. This
cookie is also captured by Ivan and stored in several log files.
Besides Ivan using the username and password to login to any
of Benedict's site, Ivan uses the Cookie Manager Firefox add-
on to replace his Facebook cookie data on Firefox. Thereafter,
when Ivan accesses Facebook, he is recognized as Benedict.
He can thereafter mask himself as Benedict.
Conclusion
In conclusion, this report has analyzed how users are tracked
on the internet. It highlighted the advantages and
disadvantages of user tracking. It also briefly outlined what
the information from user tracking is used for. Finally, it
outlined a sample exercise whereby a man-in-the middle
attack was conducted in order to gather user tracking
information which passes over the internet. This exercise was
successful, and the attack can be replicated.
I. BIBLIOGRAPHY
[1] ADsafe, Making JavaScript Safe for Advertising., 2015.
[2] Allaboutcookies, All About Computer Cookies - privacy
concerns on cookies, 2015.
[3] M. Brinkmann, Flash Cookies explained -gHacks Tech
News, 2007.
[4] D. Flanagan and P. Ferguson, JavaScript, 5 ed., O'Reilly,
2006.
[5] S. Gauch, M. Speretta, A. Chandramouli and A.
Micarelli, "User profiles for personalized information
access.," The adaptive web, vol. 1, no. 2, pp. 54-89, 2007.
[6] Java Republic, Privacy Policy - Java Republic, 2015.
[7] L. Joshila Grace, V. Maheswari and D. Nagamalai,
"Analysis of Web Logs And Web User In Web Mining,"
International Journal ofNetwork Security & Its
Applications, vol. 3, no. 1, pp. 99-110, 2011.
[8] N. Kamaraj and M. Chandran, "Tracking Down Travel
Agencies Geo-location using Software Engineering,"
IJCTT, vol. 9, no. 2, pp. 49-52, 2014.
[9] J. P. Kesan and R. C. Shah, "Deconstructing Code," Yale
Journal of Law & Technology, vol. 6, pp. 277-389, 2004.
[10] D. M. Kristol, "HTTP Cookies: Standards, privacy, and
politics," ACM Transactions on Internet Technology, vol.
1, no. 2, pp. 151-198, 2001.
[11] M. Kusuma-Atmadja, "Some Thoughts on ASEAN
Security Co-Operation: An Indonesian Perspective,"
Contemporary Southeast Asia, vol. 12, no. 3, pp. 161-
171, 1990.
[12] Microsoft, Description of Persistent and Per-Session
Cookies in Internet Explorer, 2007.
[13] A. D. Miyazaki, "Online Privacy and the Disclosure of
Cookie Use: Effects on Consumer Trust and Anticipated
Patronage," Journal of Public Policy & Marketing,vol.
27, no. 1, pp. 19-33, 2008.
[14] B. Mobasher,R. Cooley and J. Srivastava, "Automatic
personalization based on Web usage mining," Commun.
ACM, vol. 43, no. 8, pp. 142-151, 2000.
[15] OpenTracker, How doesuser-tracking work?, 2015.
[16] L. M. Quiroga and J. Mostafa,"Empirical evaluation of
explicit versus implicit acquisition of userprofiles in
information filtering systems.," in Proceedings of the
fourth ACM conference on Digital libraries, ACM, 1999,
pp. 238-239.
[17] N. Schmuker, Web Tracking, 1 ed., Berlin University of
Technol, 2011, pp. 1-3.
[18] J. Teevan, S. T. Dumais and E. Horvitz, "Personalizing
search via automated analysis of interests and activities,"
in Proceedingsof the 28th annual international ACM
SIGIR conference on Research and development in
information retrieval, ACM, 2005, pp. 449-456.
[19] Wikipedia, HTTP cookie,2015.
[20] H. Xia and J. C. Brustoloni, "Hardening web browsers
against man-in-the-middle and eavesdropping attacks.,"
in Proceedingsof the 14th international conference on
World Wide Web, ACM, 2005.

More Related Content

What's hot

A Comparative Study of Recommendation System Using Web Usage Mining
A Comparative Study of Recommendation System Using Web Usage Mining A Comparative Study of Recommendation System Using Web Usage Mining
A Comparative Study of Recommendation System Using Web Usage Mining Editor IJMTER
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technologyanchalsinghdm
 
Framework for web personalization using web mining
Framework for web personalization using web miningFramework for web personalization using web mining
Framework for web personalization using web miningeSAT Publishing House
 
Volume 2-issue-6-2056-2060
Volume 2-issue-6-2056-2060Volume 2-issue-6-2056-2060
Volume 2-issue-6-2056-2060Editor IJARCET
 
Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining IJMIT JOURNAL
 
Automatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage miningAutomatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage miningIJMIT JOURNAL
 
Presentation On CLoudSweeper By Harini Anand
Presentation On CLoudSweeper By Harini AnandPresentation On CLoudSweeper By Harini Anand
Presentation On CLoudSweeper By Harini AnandHarini Anandakumar
 
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...ijdkp
 
UProRevs-User Profile Relevant Results
UProRevs-User Profile Relevant ResultsUProRevs-User Profile Relevant Results
UProRevs-User Profile Relevant ResultsRoyston Olivera
 
Multitenency - Solving Security Issue
Multitenency - Solving Security Issue Multitenency - Solving Security Issue
Multitenency - Solving Security Issue MANVENDRA PRIYADARSHI
 
Sending the data already gathered from the client to the Server
Sending the data already gathered from the client to the ServerSending the data already gathered from the client to the Server
Sending the data already gathered from the client to the Serverhussam242
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
Quest Trail: An Effective Approach for Construction of Personalized Search En...
Quest Trail: An Effective Approach for Construction of Personalized Search En...Quest Trail: An Effective Approach for Construction of Personalized Search En...
Quest Trail: An Effective Approach for Construction of Personalized Search En...Editor IJCATR
 
IRJET- Noisy Content Detection on Web Data using Machine Learning
IRJET- Noisy Content Detection on Web Data using Machine LearningIRJET- Noisy Content Detection on Web Data using Machine Learning
IRJET- Noisy Content Detection on Web Data using Machine LearningIRJET Journal
 

What's hot (17)

Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
A Comparative Study of Recommendation System Using Web Usage Mining
A Comparative Study of Recommendation System Using Web Usage Mining A Comparative Study of Recommendation System Using Web Usage Mining
A Comparative Study of Recommendation System Using Web Usage Mining
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
Framework for web personalization using web mining
Framework for web personalization using web miningFramework for web personalization using web mining
Framework for web personalization using web mining
 
H0314450
H0314450H0314450
H0314450
 
Volume 2-issue-6-2056-2060
Volume 2-issue-6-2056-2060Volume 2-issue-6-2056-2060
Volume 2-issue-6-2056-2060
 
Ab03401550159
Ab03401550159Ab03401550159
Ab03401550159
 
Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining
 
Automatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage miningAutomatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage mining
 
Presentation On CLoudSweeper By Harini Anand
Presentation On CLoudSweeper By Harini AnandPresentation On CLoudSweeper By Harini Anand
Presentation On CLoudSweeper By Harini Anand
 
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
 
UProRevs-User Profile Relevant Results
UProRevs-User Profile Relevant ResultsUProRevs-User Profile Relevant Results
UProRevs-User Profile Relevant Results
 
Multitenency - Solving Security Issue
Multitenency - Solving Security Issue Multitenency - Solving Security Issue
Multitenency - Solving Security Issue
 
Sending the data already gathered from the client to the Server
Sending the data already gathered from the client to the ServerSending the data already gathered from the client to the Server
Sending the data already gathered from the client to the Server
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
Quest Trail: An Effective Approach for Construction of Personalized Search En...
Quest Trail: An Effective Approach for Construction of Personalized Search En...Quest Trail: An Effective Approach for Construction of Personalized Search En...
Quest Trail: An Effective Approach for Construction of Personalized Search En...
 
IRJET- Noisy Content Detection on Web Data using Machine Learning
IRJET- Noisy Content Detection on Web Data using Machine LearningIRJET- Noisy Content Detection on Web Data using Machine Learning
IRJET- Noisy Content Detection on Web Data using Machine Learning
 

Similar to Network Security

Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage dataijfcstjournal
 
ENHANCING CYBER SECURITY OF ONLINE ACCOUNTS VIA A NOVEL PROTOCOL AND NEW TECH...
ENHANCING CYBER SECURITY OF ONLINE ACCOUNTS VIA A NOVEL PROTOCOL AND NEW TECH...ENHANCING CYBER SECURITY OF ONLINE ACCOUNTS VIA A NOVEL PROTOCOL AND NEW TECH...
ENHANCING CYBER SECURITY OF ONLINE ACCOUNTS VIA A NOVEL PROTOCOL AND NEW TECH...IJNSA Journal
 
A literature survey on anti phishing
A literature survey on anti phishingA literature survey on anti phishing
A literature survey on anti phishingIJCSES Journal
 
Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoringiosrjce
 
Detection of Behavior using Machine Learning
Detection of Behavior using Machine LearningDetection of Behavior using Machine Learning
Detection of Behavior using Machine LearningIRJET Journal
 
Web_based_content_management_system_using_crowdsourcing_technology
Web_based_content_management_system_using_crowdsourcing_technologyWeb_based_content_management_system_using_crowdsourcing_technology
Web_based_content_management_system_using_crowdsourcing_technologyChamil Chandrathilake
 
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTTOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTcsandit
 
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTTOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTcscpconf
 
Survey on Peer to Peer Car Sharing System Using Blockchain
Survey on Peer to Peer Car Sharing System Using BlockchainSurvey on Peer to Peer Car Sharing System Using Blockchain
Survey on Peer to Peer Car Sharing System Using BlockchainIRJET Journal
 
Iaetsd web personalization a general survey
Iaetsd web personalization a general surveyIaetsd web personalization a general survey
Iaetsd web personalization a general surveyIaetsd Iaetsd
 
Mi health care - multi-tenant health care system
Mi health care - multi-tenant health care systemMi health care - multi-tenant health care system
Mi health care - multi-tenant health care systemConference Papers
 
SECURING THE WEB DOMAIN BASED ON HASHING
SECURING THE WEB DOMAIN BASED ON HASHINGSECURING THE WEB DOMAIN BASED ON HASHING
SECURING THE WEB DOMAIN BASED ON HASHINGAM Publications
 
Trust based video management framework for social multimedia networks
Trust based video management framework for social multimedia networksTrust based video management framework for social multimedia networks
Trust based video management framework for social multimedia networksVenkat Projects
 
Ijartes v1-i3-002
Ijartes v1-i3-002Ijartes v1-i3-002
Ijartes v1-i3-002IJARTES
 
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docxDATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docxSteveNgigi2
 
IRJET-An Economical and Secured Approach for Continuous and Transparent User ...
IRJET-An Economical and Secured Approach for Continuous and Transparent User ...IRJET-An Economical and Secured Approach for Continuous and Transparent User ...
IRJET-An Economical and Secured Approach for Continuous and Transparent User ...IRJET Journal
 
G03401042048
G03401042048G03401042048
G03401042048theijes
 

Similar to Network Security (20)

Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
 
ENHANCING CYBER SECURITY OF ONLINE ACCOUNTS VIA A NOVEL PROTOCOL AND NEW TECH...
ENHANCING CYBER SECURITY OF ONLINE ACCOUNTS VIA A NOVEL PROTOCOL AND NEW TECH...ENHANCING CYBER SECURITY OF ONLINE ACCOUNTS VIA A NOVEL PROTOCOL AND NEW TECH...
ENHANCING CYBER SECURITY OF ONLINE ACCOUNTS VIA A NOVEL PROTOCOL AND NEW TECH...
 
A literature survey on anti phishing
A literature survey on anti phishingA literature survey on anti phishing
A literature survey on anti phishing
 
Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
 
C017231726
C017231726C017231726
C017231726
 
Detection of Behavior using Machine Learning
Detection of Behavior using Machine LearningDetection of Behavior using Machine Learning
Detection of Behavior using Machine Learning
 
Web_based_content_management_system_using_crowdsourcing_technology
Web_based_content_management_system_using_crowdsourcing_technologyWeb_based_content_management_system_using_crowdsourcing_technology
Web_based_content_management_system_using_crowdsourcing_technology
 
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTTOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
 
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTTOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
 
Survey on Peer to Peer Car Sharing System Using Blockchain
Survey on Peer to Peer Car Sharing System Using BlockchainSurvey on Peer to Peer Car Sharing System Using Blockchain
Survey on Peer to Peer Car Sharing System Using Blockchain
 
Learning to detect phishing ur ls
Learning to detect phishing ur lsLearning to detect phishing ur ls
Learning to detect phishing ur ls
 
Iaetsd web personalization a general survey
Iaetsd web personalization a general surveyIaetsd web personalization a general survey
Iaetsd web personalization a general survey
 
Mi health care - multi-tenant health care system
Mi health care - multi-tenant health care systemMi health care - multi-tenant health care system
Mi health care - multi-tenant health care system
 
SECURING THE WEB DOMAIN BASED ON HASHING
SECURING THE WEB DOMAIN BASED ON HASHINGSECURING THE WEB DOMAIN BASED ON HASHING
SECURING THE WEB DOMAIN BASED ON HASHING
 
Trust based video management framework for social multimedia networks
Trust based video management framework for social multimedia networksTrust based video management framework for social multimedia networks
Trust based video management framework for social multimedia networks
 
Ijartes v1-i3-002
Ijartes v1-i3-002Ijartes v1-i3-002
Ijartes v1-i3-002
 
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docxDATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
 
IRJET-An Economical and Secured Approach for Continuous and Transparent User ...
IRJET-An Economical and Secured Approach for Continuous and Transparent User ...IRJET-An Economical and Secured Approach for Continuous and Transparent User ...
IRJET-An Economical and Secured Approach for Continuous and Transparent User ...
 
G03401042048
G03401042048G03401042048
G03401042048
 

Network Security

  • 1.  Abstract—The web is a worldwide collections of system providing a variety of information and communication across the web. The internet as we know it today is an outstanding success more than1,300,000,000 billion users are connected to the globe. In as much as users tend to surf the web for resources, it is of great importance to know that the web is the most popular internet service. However, this paper focuses on various ways of monitoring and tracking of users while surfing the web as well as current methods used by websites to track users. This paper further went on to enumerate how users can protect themselves from being tracked as well as highlight the importance of privacy. keywords: Cookies, Tracking devices, Authentication protocols, Server and proxy logs, Eavesdropping, Scripting. I. Introduction A great way to capture the insight about users and customers who visit the website, is to be able to track their area of interest during web surfing. In other words, this paper focuses on certain ways a user can be tracked during web surfing using certain internet tools like Cookies, web bugs, server logs and log files, JavaScript's in Java run time environments. A particular tool cannot really identify user while surfing the web, but the combination of a number of tools could be used coherently in other to identify a particular user's private details like name, age, address, email, location and frequently visited websites as addressed in this paper. However, privacy protection options available to Internet users are also addressed. These information received would enable the company improve on services rendered to users, help top strategic managers in inventing new business strategies, objectives as well as meeting goals. II. Methodology A step by step approach will be used in tracking of users and identification of users Personal Information Identification[PII] based on the following tools; Cookies, Scripts, servers and proxy logs files, as related to this paper starting from when a request is triggered from the clients browser to the server as well as the activities that occurs during the processes. Fig: 1 Physical Representation. The above Fig 1 illustrates how a user visits a site. When a new party is introduced into the Wi-Fi network as the sample user. Running tools which perform access-point spoofing in addition to packet sniffing software, all data such as scripts, cookies, server logs which are sent between the client and the HTTP server located on the internet can be read by the attacker. The figure below illustrates the set-up used to eavesdrop the connection between the client and the internet server. Web Security Atsegwasi Otsemhuno Rogers RA956@live.mdx.ac.uk. M00478276
  • 2. The access-point spoofing tool tricks the router into using a compromised computer as an access point to the HTTP server located on the internet. Logs of internet activity between the internet server and the internet pass through this compromised access point. As the access point is located on the same wi-fi connection, the location of the client computer can be easily traced down to a 10m radius. Packet-sniffing tools which are run on the compromised system capture cookie data, session data, browser-agent data, client IP address,target IP address,usernames, passwords and many other information invisibly. Scripts or cookies that are being transferred can be modified to add malicious content which would harm the client’s computer. Besides, sensitive information such as user-names and passwords, if not encrypted can be seen in plain text format. These data can be used to impersonate the client on the internet. III. Related Work The advancement of the web has ushered in the philosophy of non-obtrusive use of the web. Besides, this advancement has made data a valuable asset for the web. In order to enhance user experience, provide a personalized feel and improve services for users on the web, this data proves invaluable [14]. Providing personalized services to users makes them feel special and hence makes them loyal users of the web service. Several researchers have experimented with ways through which data on the web can be used to personally identify users and hence give them web results based on their unique identity[5][13][16][17][18]. This is termed as ‘user profiling’[16]. The tools available to internet platforms to achieve this include cookies, browser cache, proxy servers, browser agents, web logs, search logs. Data accumulated from these sources enable internet platforms to understand their users better. Research by Xia and Brustoloni was conducted into the extent of the personal information disclosed on the internet [10]. In the case study of sample users, they discovered that over 90% of users had submitted public information about their real name, pictures, email address, date of birth, relationship status and interests. While just 10% of sample users disclosed information about their physical address, this can easily be gotten from geo-location tools. This information can be used for positive benefits[13][17]. However, when it falls into the hands of the wrong people, it could be used for criminal and harmful purposes. One of the techniques used is eavesdropping[20]. Using airsnarf, an access point tool with integrated DHCP, DNS, and HTTP spoofing. The tool enables attackers to re-associate a client computers to a rogue access point. This is done by amplifying the rogue signal over the legitimate access point’s signal (with the aid of antennas). Packets between the client and the server are easily intercepted and hold important information [13]. Besides personal information, physical address can be determined from this[17]. Mobasher, Cooley and Srivastava proposed the mathematical formula used to process the vast amount of data gathered from a web user. They stated that a database of user profiles is represented as UP = [suk (ij)]m x n. Where UP represents the universal set of user profiles, suk (ij) represents the degree of interest in an item (represented by ij) by userrepresented by uk. They stated that this is algorithm is used by analytic firms to provide personalized products or web experience[14]. Data is collected through explicit feedback or implicit feedback[16]. Implicit feedback is un-obtrusive and uses cookies, browser cache, proxy servers and the other tools to gather information about user habits and profile. Explicit feedback requires users to knowingly submit information regarding their personalities to the website. Explicit feedback uses selection tools, surveys, feedback forms to gather information regarding users’ persona, habits and preferences. Both modes of feedback have their advantages and disadvantages[16]. In their research, Quiroga and Mostafa aimed to compare the strength of each feedback in a test whereby a feedback mode will provide the file requested by a specific user based on his or her profile [16]. Taking 18 users, each user had to use a system containing a record of 6,000 health records each categorized into 15 different areas. Each user had to use all 15 categories. For explicit feedback, the user was presented with a form which collect’s the user’s chosen preference based on suggestions given. This user’s inputted data is used as his or her profile and used to provide automatic suggestions of documents needed by the user. On the other hand, for implicit feedback, the usage activity on the system and viewed documents are automatically logged and used as the user’s profile. As done in the explicit feedback, the user’s profile is used to provide suggestions of the health records which the sample users would be interested in. Finally, the feedback from both implicit and explicit feedback were compared to the accuracy of a systemwhich made use of both the implicit and explicit feedback methods. Results at the end of the day showed that while the accuracy of both the explicit and implicit feedback were almost similar, the result from the system which made use of both feedback methods was far greater [16]. The end result highlighted how the use of automated systems can help provide a better outlook of the user when combined with the explicit feedback.
  • 3. In research by Teevan et al., data was added to a client- profiling agent which included information such as browsing history, documents stored on the computer and email history. They found out that the more information available to the client-profiling agent, the better the profile performed in providing results which matched the user’s intentions. In addition, it was found that results from data created through a profile out-performed that from a non-personalized search[18]. Results from several other researchers back up the usefulness of the data collected by user-tracking tools. IV. Literature Review Cookies:- When a uservisits a website, cookies are sent to the clients browser to uniquely identify the users browser. [2] It is sent together with the request made by the user. Cookies are small text files embedded in the browser of a computer when a person visits a website[2]. Cookies usually contain a serial number which uniquely identifies a user. Once they are put on a user's computer, they track the user's activity on the website, and send these information back to the website owners [6]. Whenever a user returns to the website, the web server uses the unique identifier to retrieve the user's record from their database. There are two classes of cookies, session cookies and persistent cookies. Cookies generally are used for session handling, authentication, identification of clients and storage of site preferences. Normal cookies (Session Cookies) are data saved by a website onto the user's computer during a visit to the site. Session cookies are those which are reside on a client's computer for the duration of his/her browsing session[12]. When the browser is closed, these cookies get deleted automatically. They do not store any personal information. They are used by commercial websites and are mostly employed for shopping cart functionality[9]. On the other hand, Tracking cookies (Persistent Cookies) are a specialized type of cookie that can be shared by multiple websites. Persistent cookies are stored on a user's browser, even when the browsing session ends[12]. These type of cookies are used to identify individual users and also used by website owners to analyze user surfing habits on their websites[17]. Cookies keep track of which advertisements the user has already seen on the site but personal information is not generated by the cookies but by your own input into the website through order forms, registration pages, payment pages as well as other online forms[10][12]. Flash cookies are stored by the Adobe Flash plugin. These cookies usually back up data from regular cookies. If a user deletes regular cookies, flash cookies still keep the data. A website that placed a cookie on a client's computer can still recognize the user even when the cookie is deleted, as long as it is backed up in a Flash cookie[3]. Cookies have several benefits to web users. They remember personal information (such as name, address, payment information, emails and many others), so one does not need to refill website forms or perform the same tasks over and over again[10]. Cookies remain on a user's computer for a long time, thereby making the accessibility of a website much easier for the client[10] However, depending on a website's policy, the data collected by cookies may be sold to third-parties such as marketing firms, advertisers, junk mailers[13]. Cookies can be a very powerful tool to track a particular user. Due to the fact that cookies help remember the online footprint of a returning user, by obtaining the client's computer IP address via geo-location tools (such as JavaScript, Google Maps API), the physical location of a user can be matched with his/her online profile or footprint. Scripts:- Code that runs in a web browser is one of the most powerful tools used to track user activity online. These scripts are based on JavaScript. They can either be client-side JavaScript or server-side JavaScript implementations which are translated to browser-readable language or scripts. JavaScript was created in 1995 to allow the browser to become more interactive. Since then, it has become a language used in network programming, game development and the creation of mobile and desktop applications[4]. Marketers add JavaScript to their websites through the source code or the template of their web pages to collect visitor data. When visitors visit their web pages for the first time, a JavaScript is sent which generates user browser data for storage on the client computer. The collected data is stored for a long time, and so when the user returns, the data stored can identify returning users. For storing such data, JavaScript is aided by cookies[15]. Getting to the server-side JavaScript implementations, these codes are processed by the server and sent to the browser for further processing. These codes collect information on location data, social network activity, music selections, movie preferences and user behaviour. This data is sent back to the marketers or social networks, stored, analyzed and used for various uses[15]. Due to the vast amount of data which can be collected by JavaScript, privacy activists or organizations fight to curtail the excessive use of JavaScript for tracking users online[13]. Advantages of using JavaScript for user tracking include the fact it is mostly not obstructive of the user experience on the websites. In addition, it mostly enhances the user experience on the website by sometimes tailoring products meant for the user profile. It sometimes also provide useful suggestionsas to related products or information which a user might need[4].
  • 4. In addition, as browsers become faster and more stable, JavaScript codes can run very fast in the background while the user undertakes his or her web activities[1][4]. Furthermore, the user still has control over the use of these scripts. By customizing browser JavaScript preferences or by installing tools such as NoScript, the user can block scripts not needed[15][17][20]. There are however several disadvantages of scripts. These scripts provide an opportunity for hackers and malicious script authors to run scripts on a client's computer which could be potentially harmful. However, browser vendors are aiming to restrict this problem by running scripts in a sandbox and restricting sites to a same-origin policy[1]. Social Networks Tracking: Comprises of both On line social networks (OSN) and Mobile On line social networks are related to social-based services such as Facebook, My Space, Twitter, Instagram and much more. With the help of these immerse social based services, individuals have been able to share some of their personal information with a couple of entries, such as companies, events, public places and current locations with the use of Geo location API, Geo latitude API and check_In plug-in[15][17]. Furthermore, certain browsers tend to support these functions, such as Chrome, Internet Explorer, Firefox, Safari and Opera. But Geo location is much more accurate for devices with GPS compactability like smart phones, Geo location and Geo latitude makes use of web scripting such as JavaScript, Html, CSS, and PHP to operate effectively as they are associated with Google Map in determining a user's position during tracking[13][17]. Keyloggers: As the name suggests, key-loggers log keys inputted into users' computers. Key-loggers can either be in software form or in the form of a device[11]. Legitimate programs use key-logging functionality to capture hotkeys and provide additional functionality to the user. On the other hand illegitimate use of key-loggers can be achieved. These run invisibly while recording any single keystroke made. They also log browsing activity, applications used and screenshots of the computer[11]. Key-loggers are sometimes installed as spyware via bugged or cracked software. The user believes he is installing a legitimate software but in-avertedly, the key-logger installs separately and silently. However, legitimate key-loggers are installed along with bundled software to enhance user experience. Advantages of using key loggers to track users is that data collected from the client's system can be used to improve products and services[11][15]. Such products and services include auto-complete or spell-check features Server Logs: A server log file is one of the tools employed by websites to track the activities of their users online or on the website[7]. The log file is a file (or sometimes, several files) created by the server and consists of all activities/requests performed by the user. Whenever a user visits a web site, the web server automatically collects information on the new user. Typical server log information include IP address of the client, referrer link, operating system version, user agent, page requested, time/date of client request[7]. A server log file is one of the tools employed by websites to track the activities of their users online or on the website. The log file is a file (or sometimes, several files) created by the server and consists of all activities/requests performed by the user. Whenever a user visits a web site, the web server automatically collects information on the new user. Typical server log information include IP address of the client, referrer link, operating system version, user agent, page requested, time/date of client request. A typical server log file looks like the image below: Figure: A typical server log[7] While server logs do not typically collect information on specific users, information provided can be used for tracking a user's browsing activity or pattern. It is important to note that users have no control whatsoever over the data collected by web servers. This raises ethical and legal questions over data mining[13]. Server logs are accessible only to the web administrator or to the webmaster. Information gathered from server logs are used typically used to analyze web traffic patterns, URL referrers or user agents. Data logging by servers have both advantages and disadvantages. One of the advantages provided by server logs is that it aids resolving issues. In a scenario whereby a customer has problems using an e-commerce website, with a little information provided by the client, technical problems can be resolved due to the abundant information provided by the server log(s)[7].
  • 5. Besides, marketers utilize the log file to monitor trends and tailor products and services to meet demand of end-users. In addition, information provided by the server logs ensures that webmasters and system administrators can fix loopholes and better ensure optimum connectivity for their web clients[6][7]. On the other hand, the lack of transparency in terms of what data is logged can raise privacy and subsequently legal issues. Users have no control over how companies process information obtained through server logs[13]. While a Terms of Service document may provide answers to this, there are situations whereby there is no such document[13]. Furthermore, since users have no control over how organizations process data logs, user tracking information such as IP addresses or page requests can be sold to third parties without the end-client's consent. Data collected from server logs can easily be stored into a database for further analysis or usage[13]. Server log analytics software include Google Analytics, Deep Log Analyzer, AWStats, Piwik, Webalizer etc. These software provide several indicators to marketers or website owners[7]. Scenario: An online growing market place with about 3000 users seen as one of the most popular leading companies that provides internet services for users to earn an income from home using a PC and an internet connection. IV. Proposed Method The proposed method involves steps in obtain user tracking information. Information such as username, passwords, cookie data, browsing activity and IP addresses will be gathered and stored in log files. The unsuspecting is attacked without any knowledge. While browsing on the internet, he may fail to discover that the certificates sent to his computer are spoofed certificates self-signed by the attacker. However, anti-virus software and modern browsers could trigger warnings to the unsuspecting user. If the user fails to heed warnings and continues browsing, all data between the user and the internet passes through the attacker's computer. All of this is done without any suspicious warnings given to the user. Fig: 2 Logical Representation: Our sample user is called Benedict, a student of Middlesex University. His web surfing session begins when he starts surfing and ends when he quits the browser. Our test subject is about to browse on Facebook, check his emails on Yahoo and see course materials on Unihub. Figure __: Benedict has to provide Google with information such as his name, email address, mobile phone number, date of birth and gender. Benedict provides his username and password on all three websites. However, on another computer, the attacker Ivan uses Cain and Abel to scan the wireless networks for computers available for exploitation. On Cain and Abel, Ivan resolves host-names of the captured systems and sees Benedict's computer online. He begins his man-in-the-middle attack by selecting to capture data from the wireless router and Benedict's system. This mode of attack tricks Benedict's computer to think Ivan's
  • 6. computer is the router. The router also thinks Ivan's computer is Benedict's computer. He thereafter starts poisoning traffic between the router and Benedict. When Benedict clicks the log-in button with the username and password, this data is captured in plain text format and can be seen by Ivan on his computer A cookie is inserted into Benedict's computer upon login. This cookie is also captured by Ivan and stored in several log files. Besides Ivan using the username and password to login to any of Benedict's site, Ivan uses the Cookie Manager Firefox add- on to replace his Facebook cookie data on Firefox. Thereafter, when Ivan accesses Facebook, he is recognized as Benedict. He can thereafter mask himself as Benedict. Conclusion In conclusion, this report has analyzed how users are tracked on the internet. It highlighted the advantages and disadvantages of user tracking. It also briefly outlined what the information from user tracking is used for. Finally, it outlined a sample exercise whereby a man-in-the middle attack was conducted in order to gather user tracking information which passes over the internet. This exercise was successful, and the attack can be replicated. I. BIBLIOGRAPHY [1] ADsafe, Making JavaScript Safe for Advertising., 2015. [2] Allaboutcookies, All About Computer Cookies - privacy concerns on cookies, 2015. [3] M. Brinkmann, Flash Cookies explained -gHacks Tech News, 2007. [4] D. Flanagan and P. Ferguson, JavaScript, 5 ed., O'Reilly, 2006. [5] S. Gauch, M. Speretta, A. Chandramouli and A. Micarelli, "User profiles for personalized information access.," The adaptive web, vol. 1, no. 2, pp. 54-89, 2007. [6] Java Republic, Privacy Policy - Java Republic, 2015. [7] L. Joshila Grace, V. Maheswari and D. Nagamalai, "Analysis of Web Logs And Web User In Web Mining," International Journal ofNetwork Security & Its Applications, vol. 3, no. 1, pp. 99-110, 2011. [8] N. Kamaraj and M. Chandran, "Tracking Down Travel Agencies Geo-location using Software Engineering," IJCTT, vol. 9, no. 2, pp. 49-52, 2014. [9] J. P. Kesan and R. C. Shah, "Deconstructing Code," Yale Journal of Law & Technology, vol. 6, pp. 277-389, 2004. [10] D. M. Kristol, "HTTP Cookies: Standards, privacy, and politics," ACM Transactions on Internet Technology, vol. 1, no. 2, pp. 151-198, 2001. [11] M. Kusuma-Atmadja, "Some Thoughts on ASEAN Security Co-Operation: An Indonesian Perspective," Contemporary Southeast Asia, vol. 12, no. 3, pp. 161- 171, 1990. [12] Microsoft, Description of Persistent and Per-Session Cookies in Internet Explorer, 2007. [13] A. D. Miyazaki, "Online Privacy and the Disclosure of Cookie Use: Effects on Consumer Trust and Anticipated Patronage," Journal of Public Policy & Marketing,vol. 27, no. 1, pp. 19-33, 2008. [14] B. Mobasher,R. Cooley and J. Srivastava, "Automatic personalization based on Web usage mining," Commun.
  • 7. ACM, vol. 43, no. 8, pp. 142-151, 2000. [15] OpenTracker, How doesuser-tracking work?, 2015. [16] L. M. Quiroga and J. Mostafa,"Empirical evaluation of explicit versus implicit acquisition of userprofiles in information filtering systems.," in Proceedings of the fourth ACM conference on Digital libraries, ACM, 1999, pp. 238-239. [17] N. Schmuker, Web Tracking, 1 ed., Berlin University of Technol, 2011, pp. 1-3. [18] J. Teevan, S. T. Dumais and E. Horvitz, "Personalizing search via automated analysis of interests and activities," in Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2005, pp. 449-456. [19] Wikipedia, HTTP cookie,2015. [20] H. Xia and J. C. Brustoloni, "Hardening web browsers against man-in-the-middle and eavesdropping attacks.," in Proceedingsof the 14th international conference on World Wide Web, ACM, 2005.