A talk about the importance of data quality, and how to assess it. Notes available at https://anacanhoto.com/2023/03/27/its-not-because-a-dataset-is-big-that-it-will-be-good-and-it-is-not-because-we-used-a-sophisticated-algorithm-that-the-decision-will-be-fine/
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
ย
Data Qual.pptx
1. The role of data quality in
managerial decision making
Ana Isabel Canhoto
University of Sussex Business School
a.i.canhoto@sussex.ac.uk
www.anacanhoto.com;
@canhoto
2. Customer
Data
Customer
insight
Service automation - e.g.,
Customer service,
Recommendations, Customer
screening, Market researchโฆ
generates
Machine
Learning
2
ยฉ Ana Isabel Canhoto, 2023
Service Automation
3. The promise of Big Data + ML
ยฉ Ana Isabel Canhoto, 2023 3
FIG. 1 โ The encoding of platform participation by social media.
Viewed in this light, social media establish online a drastically simplified version of
social interaction and communication. Essentially, on social media basic things or
entities such as users, comments, photos, posts are all classified as data objects and
every activity connecting two objects as action. For instance, Facebook defines sta-
tus updates, pictures, videos, etc. as objects because in this way objects can be con-
nected, or, as Facebook calls it, edged (Bucher 2012). Through this elementary syn-
tax, every action undertaken on Facebook generates an edge, that is, a link connect-
ing two objects. โLiking an object, tagging a photo, leaving a comment, these are all
edge generators.โ3
Encoding activities such as sharing, tagging, liking, and so on
provide connections between two objects that can be further computed (see Figure
2). By processing the data resulting from the encoding of user interaction, the system
is able to extract potentially meaningful sets of information on user behavior. For
instance, in the case of Facebook, connections or edges are ranked under different
criteria, such as how recent they are (what Facebook calls time decay), or how close
the two end-users connected are (what Facebook calls affinity) (see Bucher 2012).
FIG. 2โ The exemplified script of social interaction as encoding of data.
3
See for instance Taylor (2011) โEverything you need to know about Facebookโs EdgeRankโ The
4. The promise of Big Data + ML
ยฉ Ana Isabel Canhoto, 2023 4
Source: https://doi.org/10.1016/j.bushor.2019.11.003
Type Description Examples
Historical Records of past events
stored in internal or external
databases
Customerโs past transaction
data
external credit rating
information
Real time Activity data collected via
sensors of by online trackers
Beacons in stores, or tracking
of online activity
Knowledge Records of outcomes of past
problem-solving exercises
Past product
recommendations which were
accepted or rejected
5. The promise of Big Data + ML
โข Novel insight
โข Cost savings
โข Branding
ยฉ Ana Isabel Canhoto, 2023 5
Source: https://doi.org/10.1016/j.indmarman.2021.11.001
6. The banking project
ยฉ Ana Isabel Canhoto, 2023 6
More: https://www.starlingbank.com/docs/reports-research/StarlingGenderRepresentationReport.pdf
8. ยฉ Ana Isabel Canhoto, 2023 8
Source: https://www.jamaissanselles.fr/biais-intelligence-artificielle/
9. The handful of datasets that rule our
lives
ยฉ Ana Isabel Canhoto, 2023 9
Over 50% of dataset usages in PWC as of June 2021 can
be attributed to just twelve institutions. Moreover, this
concentrationโฆ has increased to over 0.80 in recent years
(Figure 3 right red)
Source: https://arxiv.org/abs/2112.01716
10. The promise of Big Data + ML
ยฉ Ana Isabel Canhoto, 2023 10
11. Customer
Data
Customer
insight
Service automation - e.g.,
Customer service,
Recommendations, Customer
screening, Market researchโฆ
generates
Machine
Learning
11
ยฉ Ana Isabel Canhoto, 2023
It starts (and ends) with data
12. Quality of data
needed to inform
decision making
Quality of data
available to
inform decision
making
The data quality gap
12
ยฉ Ana Isabel Canhoto, 2023
13. Data quality
13
ยฉ Ana Isabel Canhoto, 2023
Product
Production
Access
Use
Adapted from: Kahn, B. K., Strong, D.M. & Wang, R. Y. (2002). Information quality benchmarks: Product and
service performance. Communications of the ACM 45(4): 184-192.
Sound
(What is shared)
Dependable
(How it is shared)
Fit for use
(What is gathered)
Usable
(How it is gathered)
โข Achieve goal
โข Accurate representation
(identity, activities, state of
mindโฆ)
โข Explains phenomenon
โข Novel
โข Timely
โข Understandable
โข Availability
โข Ease of use
โข Task fit (affordances)
โข Netiquette (norms)
โข Accessible
โข Easy to integrate
โข Cost effective
โข High quality source
14. Fit for use
Usable
Sound
Dependable
Product
Production
Access
Use
โข Explains phenomenon
โข High quality source
โข Accurate
representation
โข Availability
โข Accessible
โข Easy to
integrate
โข Cost effective
โข Netiquette (norms)
โข Task fit (affordances)
โข Ease of use
โข Understandable
โข Timely
โข Novel
โข Achieve goal
Data
Qual
ยฉ Ana Isabel Canhoto, 2023
14
15. Customer
Data
Customer
insight
Service automation - e.g.,
Customer service,
Recommendations, Customer
screening, Market researchโฆ
generates
Machine
Learning
15
ยฉ Ana Isabel Canhoto, 2023
Service Automation
16. The role of data quality in
managerial decision making
Ana Isabel Canhoto
University of Sussex Business School
a.i.canhoto@sussex.ac.uk
www.anacanhoto.com;
@canhoto