Here I mention some classical approach for data science projects and show why it is not enough. The metadata can help then to control the differences between serving and training data. Moreover metadata can be used for creation of model unit tests.
1. ( )Data ~Model) Quality in Qual) Quality in ity in in
:Data Scien ce Projects:
,No Garbage In No Garbage Out
2019Zin ay in ida Ken s: che @ DataNatives 2019 DataNatives:
4. Preparin g data can be a big effort
:Record l) Quality in in kage
> -Fos: s: il) Quality in RACHEL -
Han dbag
:Col) Quality in our bl) Quality in ack
189,95 €
Outer
:material) Quality in L - eather
:L - in in g Textil) Quality in e
:Fas: ten in g Zip
:Compartmen ts: Mobil) Quality in
e phon e pocket
Fos: s: il) Quality in Rachel) Quality in Tote bag l) Quality in eather bl) Quality in ack
7507001SKU# ZB7507001 ZB7507001
€141.75
on e s: l) Quality in ip pocket on the fron t
s: l) Quality in ip pocket on the back
cl) Quality in os: es: with zipper
( 22two l) Quality in eather han dl) Quality in es: han dl) Quality in e drop
)cm
-fittin gs: of gol) Quality in d col) Quality in oured metal) Quality in
5. Some required checks:
● / , ,Feature’s: min max mean mos: t common val) Quality in ue
● ( )His: tograms: the ratio for each bag man ufacturer
● (Fraction of n ul) Quality in l) Quality in val) Quality in ues: the bag col) Quality in or mus: t be in at l) Quality in eas: t
90% )of en tries: to run recommen dation s:
● Is: the Cardin al) Quality in ity in ? ( )kn own to us: s: everal) Quality in outer material) Quality in s:
● An y in outl) Quality in iers: outs: ide n ormal) Quality in dis: tribution
6. Fal) Quality in s: e dis: coveries: through mul) Quality in tipl) Quality in e
hy in pothes: is: tes: tin g
,Sign ifican t n ot s: ign ifican t Not importan t
7. qual) Quality in ity in
check
, ,Data cl) Quality in ean in g record l) Quality in in kage
,data profil) Quality in in g data s: tan dardis: in g
Model) Quality in
buil) Quality in din g
Depl) Quality in oy in men t
Mon itorin g
Val) Quality in idation & Fixing Fixin g
Model) Quality in
Eval) Quality in uation & Fixing
Experimen tation
Tes: tin g
modeldatamodel) Quality in
Train in g
data
Tes: t
data
B7507001es: t
model) Quality in
Productive
model) Quality in
Tes: t
data
Servin g
data
Productive
model) Quality in
Machin e L - earn in g Pipel) Quality in in e
ML -
model) Quality in s:
8. qual) Quality in ity in
check
, ,Data cl) Quality in ean in g record l) Quality in in kage
,data profil) Quality in in g data s: tan dardis: in g
Model) Quality in
buil) Quality in din g
Depl) Quality in oy in men t
Mon itorin g
Val) Quality in idation & Fixing Fixin g
Model) Quality in
Eval) Quality in uation & Fixing
Experimen tation
Tes: tin g
modeldatamodel) Quality in
advan ced
qual) Quality in ity in
check
Train in g
data
Tes: t
data
B7507001es: t
model) Quality in
Productive
model) Quality in
Tes: t
data
Servin g
data
Productive
model) Quality in
Machin e L - earn in g Pipel) Quality in in e
Feature s: kew an d
dis: tribution s: kew mon itorin g
ML -
model) Quality in s:
9. ?What is: s: kew
● - :Feature bas: ed s: kew
/Fin d train in g data s: l) Quality in ices: that l) Quality in ead to high l) Quality in ow model) Quality in performan ce
B7507001ags: from imitate leather le leatherathe leatherr are n ot recommen ded to cus: tomers: => imitate leather
le leatherathe leatherr was: n ot con s: idered in the train in g data
● - :Dis: tribution bas: ed s: kew
?Are there an y in deviation s: between train in g an d s: ervin g data
Meas: ure dis: tribution dis: tan ces: us: in g , -cos: in e s: imil) Quality in arity in Kol) Quality in mogorov Smirn ov
, , .dis: tan ce KL - divergen ce etc
10. Al) Quality in ert → Action Action
Data qual) Quality in ity in das: hboard
Features:
Dis: tribution
An omal) Quality in y in Al) Quality in erts:
?B7507001ugs: in Data Acquis: ition or In ges: tion
? -Probl) Quality in ems: with s: ource data RPC timeout
11. Model) Quality in s: don ’t an s: wer un as: ked
ques: tion s:
● =Imitate l) Quality in eather L - eatherette
● The age metric is: chan ged – from days to hours from day in s: to hours:
● New features: rel) Quality in evan t for recommen dation s:
12. /Metadata for Data Model) Quality in Qual) Quality in ity in
Model) Quality in
buil) Quality in din g
,Depl) Quality in oy in men t
,Mon itorin g
Val) Quality in idation & Fixing Fixin g
Model) Quality in
Eval) Quality in uation & Fixing
Experimen tation
Tes: tin g
modeldata
Train in g
data
Tes: t
data
Productive
model) Quality in
Gen erated
data
Servin g
data
Productive
model) Quality in
Metadata Check for deviation s:
ML -
model) Quality in s:
B7507001es: t
model) Quality in
13. Havin g the right data is: crucial) Quality in
14. Referen ces:
● , ,Dan il) Quality in o Sato Arif Wider Chris: toph Win dheus: er
“Con tin uous: Del) Quality in ivery in for Machin e L - earn in g”
● , . , ,Al) Quality in kis: Pol) Quality in y in zotis: Martin A Zin kevich Steven Whan g Sudip Roy in “Data
, 2017Man agemen t Chal) Quality in l) Quality in en ges: in Production Machin e L - earn in g “ ICMD
● , , , , ,Eric B7507001reck Neokl) Quality in is: Pol) Quality in y in zotis: Sudip Roy in Steven Euijon g Whan g Martin
, 19Zin kevich “Data val) Quality in idation for machin e l) Quality in earn in g” Sy in s: ML - ’
● :// . . / / - - - -https: www s: cien tificamerican com articl) Quality in e how a machin e l) Quality in earn s:
/prejudice