Topic Learning Outcomes
Bythe end of this topic, students should be able to:
1. Explain sources of business data
2. Discuss the importance of considering ethical and privacy issues in
collecting and using data
Data Sources
Data foranalytics can come from a variety of sources:
Organizational
Databases
Social Media
Publicly
Available
Datasets
Sensor Data
Surveys and
Market
Research
8.
Organizational Databases
Organizational databasescontain data related to a business and its
operations:
Products Customers
Transactions Suppliers Promotions Employees
Authorized users can access these data by querying the database, usually
using SQL (Structured Query Language).
9.
Social Media Data
Channelowners can obtain analytics
data for their own channels using
social media analytics tools
Data obtained from an
organization's own social media
channels on how users share, view
or engage with the content
Publicly available information
shared by social media users on
their own channels
Social listening refers to scraping
social media sites for content
10.
Publicly Available Datasets
Thereare many publicly available open data sets:
- Government data, such as
- Malaysia Open Data Portal (data.gov.my)
- Malaysian Department of Statistics (dosm.gov.my)
- European Data (data.europa.eu)
- Kaggle (Kaggle.com)
- Google dataset search (https://datasetsearch.research.google.com/)
- Datahub.io
The data is usually downloadable in text format, as comma-separated values
(csv files)
11.
Sensor Data
Sensor datacomes from IOT data sources as a data stream.
A streaming data
pipeline has to
be set up for the
data to flow from
the source to the
destination.
12.
Surveys and MarketResearch
Organizations may also conduct surveys to
collect data for analysis or perform market
research.
Some companies specialize in data collection
and market research; organizations may pay for
their services or pay for data collections.
Describing Data
Data itemsare usually stored in a table format:
GameID GameTitle ReleaseDate Price Rating
1692392 Feudal Fantasy Incremental 17/02/2023 RM4.27 Teen
1693823 GameDev Life Simulator 18/02/2023 RM49.00E10+
1682912 Going Deep 16/02/2023 RM13.99Teen
3690293 Horror Adventure 18/02/2023 RM26.75Teen
1691032 IBIS AM 17/02/2023 RM5.69 Mature
1702983 Maze (The Amazing Labyrinth) 20/02/2023 RM26.75Everyone
1698732 Mountain Alpaca 19/02/2023 RM5.69 Everyone
1704391 Parasomnia 20/02/2023 RM8.50 Mature
variables
Data types ID Character Datetime Numeric Character
records (or
observations,
cases,
instances)
values
Categories and Measures
Categorydata items
Variables with character and datetime data types are
treated as categories.
These variables have distinct values which are used to
group the records, for example by country, or by month.
These values can also be summarized to find the count of
each value, or the mode
- 10 records with "December" Hire Date
- "Sales" is the most frequently occurring Department
20.
Categories and Measures
MeasureData Items
Numeric data items are treated as measures.
These are data items whose values can be used
in calculations.
These values are usually summarized such as
finding the mean, sum, standard deviation.
- Mean Annual Salary of $50,000
- Sum of Total Orders
Ethics and Privacyin Data Science
A data science project may have an impact to society
in terms of
• Ethical issues
• Privacy issues
23.
Ethical Issues
Definition
Ethics
1. aset of moral principles : a theory or
system of moral values
2. the principles of conduct governing an
individual or a group
Source: “Ethic.” Merriam-Webster.com Dictionary, Merriam-Webster,
https://www.merriam-webster.com/dictionary/ethic. Accessed 19 Feb. 2023.
24.
Ethical Issues inData Science
Denying access to
social services to
individuals who
criticize government
policies
Classifying
neighborhoods as
"safe" or "unsafe"
may affect the
livelihood of business
owners
Determining whether
to hire a new
employee by
analysing their social
media content
Using customer data
from an online
contest to market
new products
25.
Ethics
Informal:
1. Is theresomeone who would like it to be kept quiet?
2. Would you tell your mother?
3. Would you talk about it on youtube?
4. If you advertised it, would people admire or criticize your organization?
5. What does your instinct tell you?
Recognizing an Ethical Issue
26.
Ethics
Formal Methods:
1. Doesthe act violate corporate policy?
2. Does it violate corporate or professional code of conduct or ethics?
3. Does it violate the “Golden Rule”?
⚫ treat others the way you wish them to treat you.
Recognizing an Ethical Issue
27.
Professional Code ofEthics
Data scientists may adhere to professional code of conduct based on their
membership in professional associations:
- Malaysia Board of Technologists have a MBOT Code of Ethics for
Technologists and Technicians which cover general professional practice
for technical professionals
- The Data Science Association has a more specialized Data Science Code
of Professional Conduct geared towards data science professionals
Data Protection
In orderto protect individual data privacy, governments have
implemented data protection laws:
- The European GDPR (General Data Protection Regulation) is
applicable to all European Union member countries
- An interactive map by DLA Piper shows other countries with data
protection laws
35.
Malaysia’s PDPA
Malaysia implementedthe Personal Data Protection Act 2010 to ensure
information security by all organizations who perform data processing
relating to commercial transactions in Malaysia.
35
To ensure that
organisations
- Explain the purpose of
data collection
- Seek consent for data
collection
- Establish data
protection policies
36.
Malaysia’s PDPA
36
3 rolesdefined in the PDPA:
• A licensed organization
or individual who is
processes, has control
over or authorizes
processing of personal
data
Data User
• An individual who is the
subject of personal data
Data
Subject
• Any person other than
an employee of the
Data User who
processes the personal
data on behalf of the
Data User
Data
Processor
37.
Personal Data
Personal datain the PDPA means information
- in respect of commercial transactions
- directly or indirectly related to a data subject
- who is identified or identifiable from that information and
- other information in the possession of the data user.
Sensitive personal data means any information related to:
- Physical or mental health or condition
- Political opinions
- Religious or other similar beliefs
- Offence records
38.
Personal Data ProtectionPrinciples
General Principle
• Prohibits a data user from processing a data subject's personal
data without his/her consent
Notice and Choice
Principle
• Requires a data user to inform a data subject on how the personal
data is being used and provide a means of providing consent
Disclosure Principle • Prohibits the disclosure of personal data without consent
Security Principle • Obligation of the data user to protect the personal data
Retention Principle • The personal data is not to be retained longer than necessary
Data Integrity
• Responsibility of the data user to take reasonable steps to ensure
the personal data is accurate and complete
Access Principle
• The data subject has the right to access and correct his/her own
data
39.
The Rights ofData Subjects
39
Right of Access to Personal Data
•A data subject can request for information on the personal data that is being
processed
Right to Correct Personal Data
•A data subject can request for personal data to be corrected if it is misleading,
inaccurate or outdated
Right to Withdraw Consent
•A data subject may request to withdraw consent for processing of personal data
Right to Prevent Processing
•A data subject may request the data user not to begin or cease processing of personal
data:
•that is causing or likely to cause damage or distress
•for the purpose of direct marketing
[Source: pdp.gov.my]
40.
Ethics in DataScience
Case Study:
You are working on a data science project for non-governmental
organization that collects donations. You would like to collect information
about donors and the amount that they have donated to various causes. You
hope that with the information about how much they have donated and
how often, you will be able to encourage run targeted marketing campaigns
to identify potential donors who will make more donations in the future.
41.
Ethical and PrivacyIssues
An answer
Model the data
Explore the data
Collect the data
A question
Discuss what will need to be done for the data science project described.
What are some ethical and privacy issues that need to be
considered? How would you address them?
References
Bruce, P.C. andFleming, G. (2021). Responsible Data Science. Wiley.
Personal Data Protection Commissioner Malaysia (n.d.) What you
need to know? Personal Data Protection Act 2010. Department of
Personal Data Protection, https://www.pdp.gov.my/jpdpv2/
Pierson, L. (2021). Data Science For Dummies. For Dummies.
Van Der Velden, J. (2021). Introduction to Data Science Course
Notes. SAS Institute.