118. Example: Label Noise Remediation
Identify noisy
samples
•Check target columns /
attributes to detect suspect
noisy or anomalous labels
•E.g. Target column with the
following entries :
A B Label
Abc def 1
Abc def 2
Abc def 1
Abc def 1
Visual preview of
noisy data
•Sampling to show presentable
number of records in batches
> Show suspected noisy class
labels and predicted labels to
preview
A B Label Predicted
Abc def 1 1
Abc def 2 1
Abc def 1 1
Abc def 1 1
Remediation input
from HIL
•Review
•Correct
•Approve
> HIL reviews if row is
noisy or not, verifies if
predicted label is correct or
not and additional marks
pertinent columns
corresponding
Document and
preserve
•Learn operations for
future recommendations
to reduce HIL effort and
enable reuse/transfer of
knowledge
> In future, better
prediction using pertinent
columns and user
actions
119. Example: Data Homogeneity and Transformation
Identify
heterogenous
samples
•Check target columns /
attributes to detect variation
on sample data
•E.g. Target column with the
following entries :
<=50K
<=50K.
>50K
>50K.
E.g. Gender column with the
following entries:
male
M
female
F
Visual preview of
problematic data
•Sampling to show
presentable number
•Give explanation
•Viewing options to reduce
human effort
> Show unique class labels
and ask for
correction/verification
> Show distinct formats and
ask for correction/verification
Transformation
input from HIL
•Review
•Correct
•Approve
> HIL sees anomaly in the
‘.’ appearing at end and
provides corresponding
transformation
> Rectify heterogenous
data formats and specify
rule for transformation
Document and
preserve
•Learn operations for
future recommendations
to reduce HIL effort and
enable reuse/transfer of
knowledge
> In future, suggest
remedial action
> Suggest possible
remediations or rules