SlideShare a Scribd company logo
1 of 3
Download to read offline
A personal privacy tool for anti-stalking
Guang Mo
Computing and Software Department, McMaster University
I. INTRODUCTION
The popular uses of Internet, along with the rising popular-
ity of social medias, personal information has made widely
available on the web; this has opened up opportunities for
people who is interested in getting someone else information
on the web. While public information of a person such
as name, occupations are not a concern, the personal in-
formation on the web most of the time contains private
information, such as contact emails, phone number, or even
living address. This problem becomes more severe when
taking consideration of an individual effort in protecting
personal privacy. People are usually very busy with life,
spending time to find out what kind of information of them
are available on the web is a time consuming process. For
example, they would have to go to a search engine, and
access every single returned results; This process usually
takes considerable amount of time since an individual has
to examine each page to find out if any personal information
has been leaked. In addition, the web results are suffered for
dynamic changes, such that the results might change from
time to time, as a result, an individual who has interest to
protect his/her personal privacy information has to repeat this
process periodically.
II. EXISTING SOLUTIONS
Currently, there is no direct solution to resolve this prob-
lem from a user perspective, e.g. a user wants to find out what
parts of his information are available on the web. However,
there are lots of available solutions for the oppose usage,
such as how to find out someones information. 411.com,
thatsthem.com, and ZabaSearch are all examples that provide
similar solution for someone to find out private personal
information for other people. One may argue that user
could simply search himself/herself via these services to see
what information are leaked. However, these solutions are
limited to residential regions as well as privately own data.
For people who live outside of U.S, these service are not
functional, as well as privately own data is out of the scope
of this problem, since user is interested in publicly leaked
private information only.
III. SOLUTION AND APPROACH
To tackle this problem, several key points of the problem
is listed below:
- Time consuming process
- Public available information
- Dynamic search result environment
A solution of an automated data mining tool is proposed.
Time consuming process is mainly due to human effort
involved in examining each of the returned search result.
Typically, user have to examine each page to see if it
contains their phone number, contain email, living address,
and etc. This process is defined with clear repeatable steps,
as a result, it could be done automatically. Thus, the tool
will automate the time consuming examining search result
process by two steps:
1. ask user for sensitive information, such as name,
and any sensitive private information.
2. conduct the search, and perform data mining on the
sensitive information.
In order to maximize efficiency, this tool also gives
user alert when any sensitive information is leaked. Users
thus do not need to pay any attention if alert is not fired,
and they will have direct access to the sources when alert is
received to find out where their information are leaked. The
search scope of this tool is subjected to location, which is
set to be Canada only for now. By limiting the search scope,
accuracy of search results can be improved significantly. For
other location inquiries, please contact author for change.
By using this privacy tool, the problem is addressed. Users
will no longer spend much time on finding out what parts
of information are leaked, as well as users can poll the page
periodically to handle dynamic search result environment.
IV. CONTRIBUTIONS
The key contribution of this project is the automated
privacy tool with focus on protecting personal private in-
formation. Users are directly benefited by its efficiency on
identifying any information leaked on the web, and bring
their attention to their private information. Once leaked
information are identified, users can take corresponding
actions immediately, such as if they have put their private
information on a site accidentially, they can then delete those
information. As for now, the majority of the public are not
paying much attention of the importance of their privacy
information online. By easing the process of finding out
personal information leak online, hopefully, this can bring
awareness of personal privacy to the public.
V. DESIGN AND IMPLEMENTATION
The design of the privacy tool mainly composes of 5
components, and it is depicted in Figure 1.
Fig. 1. Design of privacy tool
A. Input UI
This user interface is the first layer of the tool that user
faces, it simply asks users for name, and optionally for any
of email, phone, and etc. The key design consideration for
this component is to keep simplicity on mind, as potential
users look for efficiency when they choose to use this tool.
B. Search Query Generator
This component is another value add to users, as users
might neglect some of the possible queries when they do
the search by themselves, such as they might only search
their names, but in fact, when searching their email addresses
may reveal relevant personal information. The component is
entirely transparent to users. The search query generator is
responsible to generate all possible combinations with any
of two given inputs. Two given inputs are chosen due to
there are rarely search results returned with three or more
combined inputs based on initial experiments conducted. Of
course, the queries that contain only one single input is also
considered.
C. Search Processing
Search Processing component is responsible to conduct
the search on behave of users. It takes the inputs from
Search Query Generator one at a time, and then stores the
outputs accordingly. Initially, various search engines were
considered. However, search engines on the market have
banned all non-human conducted search behaviours. As a
result, Google search engine is chosen by its popularity and
open search APIs (Google Custom Search Engine). This
decision also has introduced some constrains to the project,
such as the amount of search calls allowed to make daily.
For free version, the custom search engine only allows to
make 100 search calls per day. However, the overall result
with Search Processing is satisfactory due to the accuracy of
results.
D. Information Fetcher
Information Fetcher is responsible for all the data mining
work performed on return search results from Search Pro-
cessing. One single returned search result usually contains
lots of information, includes both relevant and irrelevant
information, for example, the result may not only contains
results about you, but also contains results about something
else, such as a place. As a result, a filter has to be imple-
mented first, so that Information Fetcher will only fetch the
relevant results. Filter is designed with passing criteria as
exact input keys, if any variations in the result is found, the
result will be filtered out. Once only relevant results are left,
Information Fetcher will kick in by identifying if any other
provided inputs can be found with the results. Two decisions
will be made after this process; If any other provided inputs
are found, a flag is set to be on to alert user, and source of
the page is recorded. If none of other provided inputs are
found, flag is not set, but source of the page is still recorded
in case users might be interested to see where their names
are listed.
E. Output UI
Lastly, Output UI is responsible to inform user the results.
A table with relevant information is displayed. If Information
Fetch is in alert state, an alert will be prompted to the user,
and the table will reflect the information.
F. Implementation
The detail implementation is not discussed here. The tool
uses web platform with Node.js and Express framework to
support the implementation. However, most of the imple-
mentation work flows are simply followed what has been
discussed in Design section. Since the tool is not aimed
to keep any user information, there are minimal effort put
into server side. Most of work is done on client side with
Javascript and HTML language to implement. The tool is
currently hosted on IBM Bluemix, and it can be accessed
via http://anti-stalking.mybluemix.net .
In addition to code implementation, there is an additional
component was required to apply and configure. As men-
tioned earlier, the search engine is provided by Google, and
named as Google Custom Search Engine. In order for it to
function, it needs to be configured accordingly.
VI. RESULTS
With around 50 times tries, the results are satisfactory
with reliable accuracy. However, the search engine behaves
slightly different than the actual Google search engine. As
discussed earlier, Google has banned all automated search
behaviours, it only allows automatic search done on Custom
Search Engine. Although Custom Search Engine has been
configured with ”search through the entire web”, it was
noticed that the returned results from Custom Search Engine
are in fact different from the standard Google Search on the
web. Thus, the consequence of this difference is the reduced
relevant results. Since the main objective for this tool is to
validate if any private information has been leaked, manual
verifications have also been performed. The sources reported
from the tool are 99 percent correct, on the other hand,
the alerts reported from the tool are consistent with results
directly came from the actual Google Search Engine.
VII. CONCLUSION AND FUTURE WORK
Overall, the expectation of this tool has been met by
providing user friendly UIs and accurate results. Users can
use this tool to find out what parts of their private information
has been leaked easily, and they will be informed if any
sensitive information is leaked from the searched results.
However, there are still plenty of room for improvements on
this tool. One would improve this tool to meet the needs of
periodically check on personal information leak on the web
by exporting the tool as an API, so that a larger scale privacy
checking product or similar software products can use the
API. In addition, a modification with search engine is desired
to meet the needs of different regions in the world, such as
people in China may not have direct access to Google Search
Engine, an alternative search engine should be configurable
via the tool. Although improvements can be made to the tool,
this tool has demonstrated the feasibility of a fully automated
personal privacy checking tool, which may lead to potential
public interests in personal information protection.
VIII. REFERENCE
1.IBM Bluemix docs: https://bluemix.net
2.Google Custom Search docs: https://cse.google.com/cse/all
3.Lots of code support for Javascript from Stackoverflow:
http://stackoverflow.com
4.Various articles on personal privacy related topics.

More Related Content

What's hot

IRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
IRJET- An Analysis of Personal Data Shared to Third Parties by Web ServicesIRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
IRJET- An Analysis of Personal Data Shared to Third Parties by Web ServicesIRJET Journal
 
Web Filters And Other Evil Doers
Web Filters And Other Evil DoersWeb Filters And Other Evil Doers
Web Filters And Other Evil DoersJazayer
 
UserZoom: Search For People Online Study
UserZoom: Search For People Online StudyUserZoom: Search For People Online Study
UserZoom: Search For People Online StudyUserZoom
 
Ethics & Technology :Facebook
Ethics & Technology :FacebookEthics & Technology :Facebook
Ethics & Technology :Facebookrahul8793
 
Google Health - NYHIMA
Google Health - NYHIMAGoogle Health - NYHIMA
Google Health - NYHIMARaj Goel
 
2013: The Connected Workplace
2013: The Connected Workplace2013: The Connected Workplace
2013: The Connected Workplacemkeane
 
Your Digital Stomping Ground
Your Digital Stomping GroundYour Digital Stomping Ground
Your Digital Stomping GroundKristin Bittner
 
YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...
YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...
YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...ijsptm
 
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...Carlton Northern
 
Tagging - Can User Generated Content Improve Our Services?
Tagging - Can User Generated Content Improve Our Services?Tagging - Can User Generated Content Improve Our Services?
Tagging - Can User Generated Content Improve Our Services?guestff5a190a
 
Sbs facebook data privacy dilemma case study
Sbs   facebook data privacy dilemma case studySbs   facebook data privacy dilemma case study
Sbs facebook data privacy dilemma case studysmumbahelp
 
NET 303 Policy Primer
NET 303 Policy PrimerNET 303 Policy Primer
NET 303 Policy PrimerBrett Elphick
 

What's hot (18)

IRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
IRJET- An Analysis of Personal Data Shared to Third Parties by Web ServicesIRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
IRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
 
Web Filters And Other Evil Doers
Web Filters And Other Evil DoersWeb Filters And Other Evil Doers
Web Filters And Other Evil Doers
 
Order 32740459
Order 32740459Order 32740459
Order 32740459
 
UserZoom: Search For People Online Study
UserZoom: Search For People Online StudyUserZoom: Search For People Online Study
UserZoom: Search For People Online Study
 
Cataloguing your friends and neighbours
Cataloguing your friends and neighboursCataloguing your friends and neighbours
Cataloguing your friends and neighbours
 
Ethics & Technology :Facebook
Ethics & Technology :FacebookEthics & Technology :Facebook
Ethics & Technology :Facebook
 
Google Health - NYHIMA
Google Health - NYHIMAGoogle Health - NYHIMA
Google Health - NYHIMA
 
Social media for attorneys 2.0
Social media for attorneys 2.0Social media for attorneys 2.0
Social media for attorneys 2.0
 
Digital stomping ground
Digital stomping groundDigital stomping ground
Digital stomping ground
 
2013: The Connected Workplace
2013: The Connected Workplace2013: The Connected Workplace
2013: The Connected Workplace
 
Your Digital Stomping Ground
Your Digital Stomping GroundYour Digital Stomping Ground
Your Digital Stomping Ground
 
June Documentation
June DocumentationJune Documentation
June Documentation
 
YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...
YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...
YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...
 
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...
 
Tagging - Can User Generated Content Improve Our Services?
Tagging - Can User Generated Content Improve Our Services?Tagging - Can User Generated Content Improve Our Services?
Tagging - Can User Generated Content Improve Our Services?
 
Sbs facebook data privacy dilemma case study
Sbs   facebook data privacy dilemma case studySbs   facebook data privacy dilemma case study
Sbs facebook data privacy dilemma case study
 
NET 303 Policy Primer
NET 303 Policy PrimerNET 303 Policy Primer
NET 303 Policy Primer
 
Kt3518501858
Kt3518501858Kt3518501858
Kt3518501858
 

Similar to A privacy tool

USER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHUSER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHijmpict
 
Online navigation Module 3 lesson.pptx
Online navigation Module 3 lesson.pptxOnline navigation Module 3 lesson.pptx
Online navigation Module 3 lesson.pptxmakoycampos
 
Online navigation Module 3 lesson.pptx
Online navigation Module 3 lesson.pptxOnline navigation Module 3 lesson.pptx
Online navigation Module 3 lesson.pptxmakoycampos
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking ijcseit
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated informationEric Sieverts
 
Semantic Search Engine using Ontologies
Semantic Search Engine using OntologiesSemantic Search Engine using Ontologies
Semantic Search Engine using OntologiesIJRES Journal
 
Search engine patterns
Search engine patternsSearch engine patterns
Search engine patternsRob Paok
 
Presentation of major project
Presentation of major projectPresentation of major project
Presentation of major projectAalekh Sharma
 
Explorers fair talk who_isincontrol_you_thealgorithm
Explorers fair talk who_isincontrol_you_thealgorithmExplorers fair talk who_isincontrol_you_thealgorithm
Explorers fair talk who_isincontrol_you_thealgorithmAnsgar Koene
 
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia Ferrer Mur
 
User Research Fast & Cheap
User Research Fast & Cheap User Research Fast & Cheap
User Research Fast & Cheap John H Douglass
 
Personalization of the Web Search
Personalization of the Web SearchPersonalization of the Web Search
Personalization of the Web SearchIJMER
 
Evaluating the use of search engines and social Media today
Evaluating the use of search engines and social Media todayEvaluating the use of search engines and social Media today
Evaluating the use of search engines and social Media todaySimeon Bala
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Marianne Sweeny
 

Similar to A privacy tool (20)

USER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHUSER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCH
 
Online navigation Module 3 lesson.pptx
Online navigation Module 3 lesson.pptxOnline navigation Module 3 lesson.pptx
Online navigation Module 3 lesson.pptx
 
Online navigation Module 3 lesson.pptx
Online navigation Module 3 lesson.pptxOnline navigation Module 3 lesson.pptx
Online navigation Module 3 lesson.pptx
 
The price of free
The price of freeThe price of free
The price of free
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
Not Your Mom's SEO
Not Your Mom's SEONot Your Mom's SEO
Not Your Mom's SEO
 
online survey
online surveyonline survey
online survey
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated information
 
I1037075
I1037075I1037075
I1037075
 
Semantic Search Engine using Ontologies
Semantic Search Engine using OntologiesSemantic Search Engine using Ontologies
Semantic Search Engine using Ontologies
 
Search engine patterns
Search engine patternsSearch engine patterns
Search engine patterns
 
Presentation of major project
Presentation of major projectPresentation of major project
Presentation of major project
 
Explorers fair talk who_isincontrol_you_thealgorithm
Explorers fair talk who_isincontrol_you_thealgorithmExplorers fair talk who_isincontrol_you_thealgorithm
Explorers fair talk who_isincontrol_you_thealgorithm
 
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
 
User Research Fast & Cheap
User Research Fast & Cheap User Research Fast & Cheap
User Research Fast & Cheap
 
Personalization of the Web Search
Personalization of the Web SearchPersonalization of the Web Search
Personalization of the Web Search
 
Evaluating the use of search engines and social Media today
Evaluating the use of search engines and social Media todayEvaluating the use of search engines and social Media today
Evaluating the use of search engines and social Media today
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
 

A privacy tool

  • 1. A personal privacy tool for anti-stalking Guang Mo Computing and Software Department, McMaster University I. INTRODUCTION The popular uses of Internet, along with the rising popular- ity of social medias, personal information has made widely available on the web; this has opened up opportunities for people who is interested in getting someone else information on the web. While public information of a person such as name, occupations are not a concern, the personal in- formation on the web most of the time contains private information, such as contact emails, phone number, or even living address. This problem becomes more severe when taking consideration of an individual effort in protecting personal privacy. People are usually very busy with life, spending time to find out what kind of information of them are available on the web is a time consuming process. For example, they would have to go to a search engine, and access every single returned results; This process usually takes considerable amount of time since an individual has to examine each page to find out if any personal information has been leaked. In addition, the web results are suffered for dynamic changes, such that the results might change from time to time, as a result, an individual who has interest to protect his/her personal privacy information has to repeat this process periodically. II. EXISTING SOLUTIONS Currently, there is no direct solution to resolve this prob- lem from a user perspective, e.g. a user wants to find out what parts of his information are available on the web. However, there are lots of available solutions for the oppose usage, such as how to find out someones information. 411.com, thatsthem.com, and ZabaSearch are all examples that provide similar solution for someone to find out private personal information for other people. One may argue that user could simply search himself/herself via these services to see what information are leaked. However, these solutions are limited to residential regions as well as privately own data. For people who live outside of U.S, these service are not functional, as well as privately own data is out of the scope of this problem, since user is interested in publicly leaked private information only. III. SOLUTION AND APPROACH To tackle this problem, several key points of the problem is listed below: - Time consuming process - Public available information - Dynamic search result environment A solution of an automated data mining tool is proposed. Time consuming process is mainly due to human effort involved in examining each of the returned search result. Typically, user have to examine each page to see if it contains their phone number, contain email, living address, and etc. This process is defined with clear repeatable steps, as a result, it could be done automatically. Thus, the tool will automate the time consuming examining search result process by two steps: 1. ask user for sensitive information, such as name, and any sensitive private information. 2. conduct the search, and perform data mining on the sensitive information. In order to maximize efficiency, this tool also gives user alert when any sensitive information is leaked. Users thus do not need to pay any attention if alert is not fired, and they will have direct access to the sources when alert is received to find out where their information are leaked. The search scope of this tool is subjected to location, which is set to be Canada only for now. By limiting the search scope, accuracy of search results can be improved significantly. For other location inquiries, please contact author for change. By using this privacy tool, the problem is addressed. Users will no longer spend much time on finding out what parts of information are leaked, as well as users can poll the page periodically to handle dynamic search result environment. IV. CONTRIBUTIONS The key contribution of this project is the automated privacy tool with focus on protecting personal private in- formation. Users are directly benefited by its efficiency on identifying any information leaked on the web, and bring their attention to their private information. Once leaked information are identified, users can take corresponding actions immediately, such as if they have put their private information on a site accidentially, they can then delete those information. As for now, the majority of the public are not paying much attention of the importance of their privacy information online. By easing the process of finding out personal information leak online, hopefully, this can bring awareness of personal privacy to the public. V. DESIGN AND IMPLEMENTATION The design of the privacy tool mainly composes of 5 components, and it is depicted in Figure 1.
  • 2. Fig. 1. Design of privacy tool A. Input UI This user interface is the first layer of the tool that user faces, it simply asks users for name, and optionally for any of email, phone, and etc. The key design consideration for this component is to keep simplicity on mind, as potential users look for efficiency when they choose to use this tool. B. Search Query Generator This component is another value add to users, as users might neglect some of the possible queries when they do the search by themselves, such as they might only search their names, but in fact, when searching their email addresses may reveal relevant personal information. The component is entirely transparent to users. The search query generator is responsible to generate all possible combinations with any of two given inputs. Two given inputs are chosen due to there are rarely search results returned with three or more combined inputs based on initial experiments conducted. Of course, the queries that contain only one single input is also considered. C. Search Processing Search Processing component is responsible to conduct the search on behave of users. It takes the inputs from Search Query Generator one at a time, and then stores the outputs accordingly. Initially, various search engines were considered. However, search engines on the market have banned all non-human conducted search behaviours. As a result, Google search engine is chosen by its popularity and open search APIs (Google Custom Search Engine). This decision also has introduced some constrains to the project, such as the amount of search calls allowed to make daily. For free version, the custom search engine only allows to make 100 search calls per day. However, the overall result with Search Processing is satisfactory due to the accuracy of results. D. Information Fetcher Information Fetcher is responsible for all the data mining work performed on return search results from Search Pro- cessing. One single returned search result usually contains lots of information, includes both relevant and irrelevant information, for example, the result may not only contains results about you, but also contains results about something else, such as a place. As a result, a filter has to be imple- mented first, so that Information Fetcher will only fetch the relevant results. Filter is designed with passing criteria as exact input keys, if any variations in the result is found, the result will be filtered out. Once only relevant results are left, Information Fetcher will kick in by identifying if any other provided inputs can be found with the results. Two decisions will be made after this process; If any other provided inputs are found, a flag is set to be on to alert user, and source of the page is recorded. If none of other provided inputs are found, flag is not set, but source of the page is still recorded in case users might be interested to see where their names are listed. E. Output UI Lastly, Output UI is responsible to inform user the results. A table with relevant information is displayed. If Information Fetch is in alert state, an alert will be prompted to the user, and the table will reflect the information. F. Implementation The detail implementation is not discussed here. The tool uses web platform with Node.js and Express framework to support the implementation. However, most of the imple- mentation work flows are simply followed what has been discussed in Design section. Since the tool is not aimed to keep any user information, there are minimal effort put into server side. Most of work is done on client side with Javascript and HTML language to implement. The tool is currently hosted on IBM Bluemix, and it can be accessed via http://anti-stalking.mybluemix.net . In addition to code implementation, there is an additional component was required to apply and configure. As men- tioned earlier, the search engine is provided by Google, and named as Google Custom Search Engine. In order for it to function, it needs to be configured accordingly. VI. RESULTS With around 50 times tries, the results are satisfactory with reliable accuracy. However, the search engine behaves slightly different than the actual Google search engine. As discussed earlier, Google has banned all automated search behaviours, it only allows automatic search done on Custom Search Engine. Although Custom Search Engine has been
  • 3. configured with ”search through the entire web”, it was noticed that the returned results from Custom Search Engine are in fact different from the standard Google Search on the web. Thus, the consequence of this difference is the reduced relevant results. Since the main objective for this tool is to validate if any private information has been leaked, manual verifications have also been performed. The sources reported from the tool are 99 percent correct, on the other hand, the alerts reported from the tool are consistent with results directly came from the actual Google Search Engine. VII. CONCLUSION AND FUTURE WORK Overall, the expectation of this tool has been met by providing user friendly UIs and accurate results. Users can use this tool to find out what parts of their private information has been leaked easily, and they will be informed if any sensitive information is leaked from the searched results. However, there are still plenty of room for improvements on this tool. One would improve this tool to meet the needs of periodically check on personal information leak on the web by exporting the tool as an API, so that a larger scale privacy checking product or similar software products can use the API. In addition, a modification with search engine is desired to meet the needs of different regions in the world, such as people in China may not have direct access to Google Search Engine, an alternative search engine should be configurable via the tool. Although improvements can be made to the tool, this tool has demonstrated the feasibility of a fully automated personal privacy checking tool, which may lead to potential public interests in personal information protection. VIII. REFERENCE 1.IBM Bluemix docs: https://bluemix.net 2.Google Custom Search docs: https://cse.google.com/cse/all 3.Lots of code support for Javascript from Stackoverflow: http://stackoverflow.com 4.Various articles on personal privacy related topics.