Missouri University of Science and Technology
Ethical Issues with Customer Data Collection
Dr. David Spurlock
The paper discusses topics related to:
Data mining-collection and analysis of large amounts of data.
Data mining is a branch of Computer Science which deals in processing large scale data to extract
previously unknown, interesting patterns. The objective is to process and get the required relevant
information from large volumes.
Ethical issues related to data mining and how it impacts web miners and web users.
Data mining does possess information privacy threat as an individual’s/group’s personal
information is freely available. The individual must have information on what is the purpose of the
data collection, who is the recipient of the data, its implications and related information. Ethical
data mining is however acceptable. It refers to the ethical usage of individual data in accordance
with the privacy rules and set standards.
Defines the fine line between ethical and unethical usage of data mining.
Although the impact of web-data mining should be a concern for every web user, there is no reason
for people to panic. This technique is not yet being used to its full potential.
There is, however, no clear indication of web data being misused to an extent that people are
Data Mining involves six stages:
1. Detection: In this stage any noticeable difference in the data patterns is detected. This stage is very
crucial, since the quality of data collected will impact on the output.
2. Dependency modelling: The relationship between the variables is found such as the buying trends
of a particular age group, effect of tax reduction on savings, sales effect on sale of goods due to
3. Clustering: Clustering is a process of partitioning a set of data (or objects) into a set of meaningful
sub-classes, called clusters. This helps in understanding the natural grouping or structure in a data
4. Classification: Data classification is the classification of data based on its level of sensitivity. The
classification of data helps determine what baseline security controls are appropriate for
safeguarding that data.
used to recognize
Information collected trhough
web navigation history (E.g.
identifiyin the IP addres and
relating it whit the company
provider (IPS) in order to
obtain more specific data (E.g.
names, address, phone
Data collected from
an user. (Generally
used to characterize
Informaton collected when the
user give information in order
to acces to certain benefits
(E.g. loggin information)
5. Regression : It is a statistical approach to forecast change in a dependent variable (sales revenue,
for example) on the basis of change in one or more independent variables (population and income).
6. Summarization: Summarization is a key data mining concept which involves techniques for
finding a compact description of a dataset. Data summarization provides the capacity to give data
consumers generalize view of disparate bulks of data.
METHODS OF DATA MINING
1. WEB TRACKING
Web tracking is all about the Companies that track consumers’ behavior across the Web without their
consent, and without providing them any recognizable value.
Behavioral audience targeting, like content targeting, sponsored advertorials, pre-rolls and every other
ad-product available in digital environments, serves content creators. Keeping content creators in
business serves consumers, giving them a myriad of digital environments to explore.
But in many cases advertisers misuse behavioral data and this is something against ethics. Third Party
companies with no direct relationship to the consumer begin tracking those consumers across numerous
websites, create profiles of that behavior and profit off that information that they haven’t asked
permission to collect. This is what we call ethically problematic issue.
Reasons for Web tracking:-
To Boost Marketing Capabilities
Law Enforcement and Intelligence
A survey is a research method for collecting information from a selected group of people using
standardized questionnaires or interviews. There are numerous survey research methods to obtain
customer preferences and likeness such as:
BIG DATA PERSPECTIVES
Large collections of data have addressed the focus on different perspectives as it can be seeing.
As a Technology innovation In order to accomplish its purpose data mining must be developed to answer
effectively the concerns about: First, the progress of storage alternatives or Volume. Second, easy acces in
real time Velocity. Third, the current data is mostly unstructured (difficult to stablish its exact use due to
the large amount and possibilities of anlysis) Variety.
As a Commercial Value: The use of data generates value trough the identification of complex patterns in
real time (foundation of market research) and the prediction of quality issues.
As a matter of privacy: In the challenge of protect the privacy, it must exits a balance in its use and the
following factors. Recolection:- sets of data analyzed independently do not represent privacy implications
but combined can threaten the privacy. Security:- Personal data can be hacked and stolen. High volume and
velocity:- Data should be autonomously analyzed (No time to wait for consents). Significance:-
Organisations are far from have the ability to use all the collected data.
These perspectives can give a scheme about the direction in which data mining is evolving, and surely is
possible to assume that there is not a coming back in the way information is being used.
Considering together the three points of view is likely to assume that the progress of the first tow
(Technology innovation and Commercial value) are linked to the use of big data as a matter of privacy
based on how personal information is analyzed and how consumer relationships are built, bearing in mind
security implications within individuals’ social interaction through the use of personal technological
ETHICAL CHALLENGES IN CUSTOMER DATA HANDLING
Information privacy is defined as the relationship between collection and dissemination of data, technology,
the public expectation of privacy, and the legal and political issues surrounding them.
Data mining does possess information privacy threat as an individual’s/group’s personal information is
freely available. The individual must have information on what is the purpose of the data collection, who
is the recipient of the data, its implications and related information.
The following are some issues in application of data mining as a commercial value with their ethical facts:
The social graph: Deducted by social networking (information given voluntary) is the picture to be built
of group-level interactions and the nature of the bonds that bring these people together
Ethical challenge: Ambiguity. Uncertainty in the group picture due to the possibility of labeling friends
with weak social ties that are not representative of the physical-world life.
Ownership of data: Instead of being collected by government entities or the traditional large companies,
data is collected by high technology companies as Facebook, Google, and Twitter among others.
Ethical challenge: Some of the owner of the data have the promise of not to sell the data now, but the
evolution of data mining as a valuable technology it can change in the future as a consequence of the
changes in the policies of data use.
Data memory: Data collected and stored can be recalled and analyzed in the future.
Ethical challenge: Information storage about individual’s life can retrieve past behaviors (E.g.
Facebook timeline can represent a disadvantage for a person who use to party very frequently and now
is in a job search). Data memory "may remove the ability for individuals to forget and be forgotten"
Passive data collection: Automatic data collection trough passive technologies. (E.g. Mobile location
Ethical challenge: Increases the amount of data collected and the variables to take in account in the
analysis of the data. But individuals are not aware of it, and even if they authorized the data collection
at a first point, systems are not asking each time that are doing the collection.
Respecting privacy in a public world: The use of technologies has become necessary nowadays and they
are of easy access, offering benefits at low cost (e.g. free apps). However the use of certain technological
devices implicates the collection of information from the servers.
Ethical challenge: Individuals can step up from giving information; however the use of the technology
has become a necessity and an important factor of social interaction, then the paradox is that making
the decision of giving information can represent to be excluded from the community.
Although now this ethical issues are challenges in the application of this technology, the laws and
regulations are gradually being updated based on the concerns on individuals privacy. Thus is important to
highlight the fact that Data mining is an emergent practice, hence it is under an adjusting phase. For its
current application the self-regulation is a very important aspect for the companies to take in consideration
when dealing with big data.
Below are some recommendations that must be taken in support of ethical data mining
1. Verify the data source for authenticity
2. Expectation of customers must be considered and respected.
3. Developing better customer relations
4. Emphasis on ethical data mining
5. Control on unregulated data access and software
6. Corrective action to be taken on offenders
CASE STUDIES- CONS
Target Corporation Case:
Target Corporation - A large scale retailer of consumer goods assigns every customer a Guest ID number,
tied to their credit card, name, or email address and stores the history of that customer’s purchases and other
demographic information they have collected from them or obtained from other sources.
Lots of people buy lotions, but one of Target’s employees noticed that women on the baby registry were
buying larger quantities of unscented lotion around the beginning of their second trimester.
An angry man went into a Target store outside of Minneapolis, demanding to talk to a manager: My
daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby
clothes and cribs? Are you trying to encourage her to get pregnant?”
The manager having no idea about the issue, looked at the mailer which was addressed to the man’s
daughter, and contained advertisements for maternity clothing, nursery furniture and pictures of smiling
The manager apologized and then called a few days later to apologize again. This time however the man
said “I had a talk with my daughter. It turns out there’s been some activities in my house I haven’t been
completely aware of. She’s due in August. I owe you an apology.”
Despite the accuracy of data analysis by Target Corporation, the teenage girl’s privacy with her personal
life is exposed and this results in unethical usage of customer behavior on the web.
Recently LinkedIn CEO Jeff Weiner admitted that the social networking site was guilty of sending too
many emails to some users.
The “Add Connection” service in LinkedIn lets users to import contacts from their email accounts and send
invitations to connect on the site. The way the "Add Connections" service works is that an email invitation
is sent out by LinkedIn to the contact, but if the person does not respond to the invitation within a certain
amount of time, LinkedIn follows up by sending them two more reminder emails.
The suit claims that LinkedIn repeatedly “spammed” those contacts with unwanted emails despite LinkedIn
members not providing their consent to send the additional emails.
LinkedIn said in an email to its users that anyone who used the service between Sept. 17, 2011, and Oct.
31, 2014, is eligible to file a claim.
The amount that each user will receive will depend on how many people come forward, but LinkedIn said
each person could earn up to $1,500.
LinkedIn says it has revised its disclosures to clarify that two reminder emails will be sent as part of its
"Add Connections" feature. The company says it will, by year's end, also offer an option to users to cancel
a connection invitation, thereby halting any additional reminder emails from being sent out.
This case is a classic example of ownership of data and passive data collection which pose ethical challenges
to customer’s privacy on the web.
ARGUMENTS TO SUPPORT DATA MINING-PROS
Arguments that defend the above discussed ethical issues based on the experiment conducted on
professionals applying web data mining practices in a business context. Their views are as follows:
Web-data mining itself does not give rise to new ethical issues.
Professionals argue that there is nothing new about web-data mining practices as it is just an
extension of old situations to new situations created by computer and information technology. One
ﬁrst has to clear up the uncertainties, which have to do with understanding what data mining is.
Most of the possible dangers come from group proﬁling, and since group proﬁling has been done
before data mining techniques were known, the issues could be considered to be old news.
There are laws to protect private information.
This argument cannot be told with conviction, as the law is never fully sufﬁcient with respect to
privacy problems. For instance, current privacy laws only offer protection for the misuse of
identiﬁable personal data but there is no legal protection for the misuse of anonymized data used
as if it were personal data. The growing number of online privacy policies is an example of self-
regulating efforts. Such policies, however, are not found on every site. Thus, there are still a lot of
sites that a person, who is concerned about his online privacy, should not visit. In addition, it is not
always an easy task for a web user to thoroughly read the privacy statements on every site he/she
Many individuals simply choose to give up their privacy, and why not use this data.
As people can refuse to give out information about themselves, they possess some power to control
their relationship with organizations. Many individuals simply choose to give up their privacy and
what can be wrong with collecting this public data from the web that is voluntarily given? It is there
for the taking.
Most collected data is not of a personal nature, or is used for anonymous proﬁles.
So why should there be a privacy problem? An argument often heard is: “Our software is used to
identify crowd behavior of visitors to web sites. Therefore, if we don’t know who you are, how can
we be invading your privacy?
Web-data mining leads to less unsolicited marketing approaches.
Data mining techniques will provide more accurate and more detailed information, which can lead
to better and fairer judgements. So, web-data mining leads to less unwanted marketing approaches.
Therefore, why would people complain?
Personalization leads to individualization instead of de-individualization.
Most customers like to be recognized, and treated as a special customer. So it is not considered a
violation of privacy to analyze usage interaction.
Although there are many ethical challenges prevalent with respect to data mining, it can be attributed to the
fact that data mining is an emerging technology and the market is adjusting to its capabilities and there is
no immediate threat to users. So, it is by no means clear that companies are using unexpected and non-
obvious associations, classifications, clusters, and profiles based on web data as grounds for decision-
The solutions discussed previously can contribute to the responsible and well considered
development and application of web-data mining.
The laws and regulations associated to it are bound to evolve depending on how it is perceived.
There are things that can be done to guide this technique in a socially acceptable direction.
As ethical issues will grow as rapidly as the technology, ethical considerations should be an
integrated and essential part of this development process instead of something at its side.
This is a joint responsibility of both web miners and web users.
Some methods to avoid web tracking:
1. Ensure that the website is safe before sharing any information or filling out any registration forms
2. Ensure that your online accounts in the different websites are configured for providing optimal
3. Use an email provider that has a reliable dedication to the protection of customer privacy.
4. Enhance the privacy of your browser through various add-ons and extensions.
Earley, S. (2014). Big Data and Predictive Analytics: What's New? IT Professional IT Prof., 13-15.
Reteived November 16, 2015.
Wel, L., & Royakkers, L. (2004). Ethical issues in web data mining. Ethics and Information Technology,
6(2), 129-140. Retrieved November 17, 2015, from
Nunan, D., & Domenico, M. (2013). Market research and the ethics of big data. International Journal of
Market Research Int. J. Market Res. http://um9mh3ku7s.search.serialssolutions.com/?ctx_ver=Z39.88-
Moftakhari, M., Ethical issues in data Mining. 23 pages. http://ickm2014.bilgiyonetimi.net/wp-
Carr, N., (2010). Tracking is an assault on liberty, with real dangers. The wall street journal.
Harper, J., (2010). It’s modern trade: Web users get as much as they give. The wall street journal.
Hill, K., (2012). How Target Figured out a teen girl was pregnant before her father did. Forbes Tech.
Roberts, J., (2015).LinkedIn will pay $13M for sending those awful mails. Fortune.