SlideShare a Scribd company logo
1 of 14
Download to read offline
( )Data ~Model) Quality in  Qual) Quality in ity in  in 
:Data Scien ce Projects: 
,No Garbage In  No Garbage Out
2019Zin ay in ida Ken s: che @ DataNatives 2019 DataNatives:
/Dis: tin guis: hin g Rus: s: ian  American  tan ks:
Preparin g data can  be a big effort
:Record l) Quality in in kage
> -Fos: s: il) Quality in  RACHEL - 
Han dbag
:Col) Quality in our bl) Quality in ack
189,95 €
Outer
:material) Quality in  L - eather
:L - in in g Textil) Quality in e
:Fas: ten in g Zip
:Compartmen ts:  Mobil) Quality in 
e phon e pocket
Fos: s: il) Quality in  Rachel) Quality in  Tote bag l) Quality in eather bl) Quality in ack
7507001SKU# ZB7507001 ZB7507001
€141.75
on e s: l) Quality in ip pocket on  the fron t
s: l) Quality in ip pocket on  the back
cl) Quality in os: es:  with zipper
( 22two l) Quality in eather han dl) Quality in es:  han dl) Quality in e drop
)cm
-fittin gs:  of gol) Quality in d col) Quality in oured metal) Quality in
Some required checks: 
● / , ,Feature’s:  min  max mean  mos: t common  val) Quality in ue
● ( )His: tograms:  the ratio for each bag man ufacturer
● (Fraction  of n ul) Quality in l) Quality in  val) Quality in ues:  the bag col) Quality in or mus: t be in  at l) Quality in eas: t
90% )of en tries:  to run  recommen dation s: 
● Is:  the Cardin al) Quality in ity in  ? ( )kn own  to us:  s: everal) Quality in  outer material) Quality in s: 
● An y in  outl) Quality in iers:  outs: ide n ormal) Quality in  dis: tribution
Fal) Quality in s: e dis: coveries:  through mul) Quality in tipl) Quality in e
hy in pothes: is:  tes: tin g
,Sign ifican t n ot s: ign ifican t Not importan t
qual) Quality in ity in 
check
, ,Data cl) Quality in ean in g record l) Quality in in kage
,data profil) Quality in in g data s: tan dardis: in g
Model) Quality in 
buil) Quality in din g
Depl) Quality in oy in men t
Mon itorin g
Val) Quality in idation  & Fixing Fixin g
Model) Quality in 
Eval) Quality in uation  & Fixing
Experimen tation 
Tes: tin g
modeldatamodel) Quality in 
Train in g
data
Tes: t
data
B7507001es: t
model) Quality in 
Productive
model) Quality in 
Tes: t
data
Servin g
data
Productive
model) Quality in 
Machin e L - earn in g Pipel) Quality in in e
ML - 
model) Quality in s:
qual) Quality in ity in 
check
, ,Data cl) Quality in ean in g record l) Quality in in kage
,data profil) Quality in in g data s: tan dardis: in g
Model) Quality in 
buil) Quality in din g
Depl) Quality in oy in men t
Mon itorin g
Val) Quality in idation  & Fixing Fixin g
Model) Quality in 
Eval) Quality in uation  & Fixing
Experimen tation 
Tes: tin g
modeldatamodel) Quality in 
advan ced
qual) Quality in ity in 
check
Train in g
data
Tes: t
data
B7507001es: t
model) Quality in 
Productive
model) Quality in 
Tes: t
data
Servin g
data
Productive
model) Quality in 
Machin e L - earn in g Pipel) Quality in in e
Feature s: kew an d
dis: tribution  s: kew mon itorin g
ML - 
model) Quality in s:
?What is:  s: kew
● - :Feature bas: ed s: kew
/Fin d train in g data s: l) Quality in ices:  that l) Quality in ead to high l) Quality in ow model) Quality in  performan ce
B7507001ags:  from imitate leather le leatherathe leatherr are n ot recommen ded to cus: tomers:  => imitate leather
le leatherathe leatherr was:  n ot con s: idered in  the train in g data
● - :Dis: tribution  bas: ed s: kew
?Are there an y in  deviation s:  between  train in g an d s: ervin g data
Meas: ure dis: tribution  dis: tan ces:  us: in g , -cos: in e s: imil) Quality in arity in  Kol) Quality in mogorov Smirn ov
, , .dis: tan ce KL -  divergen ce etc
Al) Quality in ert →  Action Action 
Data qual) Quality in ity in  das: hboard
Features: 
Dis: tribution 
An omal) Quality in y in  Al) Quality in erts: 
?B7507001ugs:  in  Data Acquis: ition  or In ges: tion 
? -Probl) Quality in ems:  with s: ource data RPC timeout
Model) Quality in s:  don ’t an s: wer un as: ked
ques: tion s: 
● =Imitate l) Quality in eather L - eatherette
● The age metric is:  chan ged – from days to hours from day in s:  to hours: 
● New features:  rel) Quality in evan t for recommen dation s:
/Metadata for Data Model) Quality in  Qual) Quality in ity in 
Model) Quality in 
buil) Quality in din g
,Depl) Quality in oy in men t
,Mon itorin g
Val) Quality in idation  & Fixing Fixin g
Model) Quality in 
Eval) Quality in uation  & Fixing
Experimen tation 
Tes: tin g
modeldata
Train in g
data
Tes: t
data
Productive
model) Quality in 
Gen erated
data
Servin g
data
Productive
model) Quality in 
Metadata Check for deviation s: 
ML - 
model) Quality in s: 
B7507001es: t
model) Quality in
Havin g the right data is:  crucial) Quality in
Referen ces: 
● , ,Dan il) Quality in o Sato Arif Wider Chris: toph Win dheus: er
“Con tin uous:  Del) Quality in ivery in  for Machin e L - earn in g”
● , . , ,Al) Quality in kis:  Pol) Quality in y in zotis:  Martin  A Zin kevich Steven  Whan g Sudip Roy in  “Data
, 2017Man agemen t Chal) Quality in l) Quality in en ges:  in  Production  Machin e L - earn in g “ ICMD
● , , , , ,Eric B7507001reck Neokl) Quality in is:  Pol) Quality in y in zotis:  Sudip Roy in  Steven  Euijon g Whan g Martin 
, 19Zin kevich “Data val) Quality in idation  for machin e l) Quality in earn in g” Sy in s: ML - ’
● :// . . / / - - - -https:  www s: cien tificamerican  com articl) Quality in e how a machin e l) Quality in earn s: 
/prejudice

More Related Content

Similar to Data Quality in data-driven projects

A_R_Gottu_Mukkula_Escape_26.pptx
A_R_Gottu_Mukkula_Escape_26.pptxA_R_Gottu_Mukkula_Escape_26.pptx
A_R_Gottu_Mukkula_Escape_26.pptxAnweshReddy22
 
Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Fwdays
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningJohn Edward Slough II
 
R workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 seriesR workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 seriesVivian S. Zhang
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity IntelligenceIT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity IntelligenceSpagoWorld
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
 
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...Ahmed Gamal Abdel Gawad
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowDatabricks
 

Similar to Data Quality in data-driven projects (10)

A_R_Gottu_Mukkula_Escape_26.pptx
A_R_Gottu_Mukkula_Escape_26.pptxA_R_Gottu_Mukkula_Escape_26.pptx
A_R_Gottu_Mukkula_Escape_26.pptx
 
Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine Learning
 
R workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 seriesR workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 series
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Bertazo et al - Application Lifecycle Management and process monitoring throu...
Bertazo et al - Application Lifecycle Management and process monitoring throu...Bertazo et al - Application Lifecycle Management and process monitoring throu...
Bertazo et al - Application Lifecycle Management and process monitoring throu...
 
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity IntelligenceIT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 

More from Zina Petrushyna

Modeling Communities in Information Systems: Informal Learning Communities in...
Modeling Communities in Information Systems: Informal Learning Communities in...Modeling Communities in Information Systems: Informal Learning Communities in...
Modeling Communities in Information Systems: Informal Learning Communities in...Zina Petrushyna
 
A Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisA Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisZina Petrushyna
 
Presentation skills for PhD students in JTEL Summer School
Presentation skills for PhD students in JTEL Summer School Presentation skills for PhD students in JTEL Summer School
Presentation skills for PhD students in JTEL Summer School Zina Petrushyna
 
Evaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationEvaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationZina Petrushyna
 
Pattern-based competence management
Pattern-based competence managementPattern-based competence management
Pattern-based competence managementZina Petrushyna
 
Self-modeling and self-reflection of E-learning communities
Self-modeling and self-reflection of E-learning communitiesSelf-modeling and self-reflection of E-learning communities
Self-modeling and self-reflection of E-learning communitiesZina Petrushyna
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)Zina Petrushyna
 

More from Zina Petrushyna (8)

Modeling Communities in Information Systems: Informal Learning Communities in...
Modeling Communities in Information Systems: Informal Learning Communities in...Modeling Communities in Information Systems: Informal Learning Communities in...
Modeling Communities in Information Systems: Informal Learning Communities in...
 
A Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisA Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data Analysis
 
Istar2014 slideshare
Istar2014 slideshareIstar2014 slideshare
Istar2014 slideshare
 
Presentation skills for PhD students in JTEL Summer School
Presentation skills for PhD students in JTEL Summer School Presentation skills for PhD students in JTEL Summer School
Presentation skills for PhD students in JTEL Summer School
 
Evaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationEvaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulation
 
Pattern-based competence management
Pattern-based competence managementPattern-based competence management
Pattern-based competence management
 
Self-modeling and self-reflection of E-learning communities
Self-modeling and self-reflection of E-learning communitiesSelf-modeling and self-reflection of E-learning communities
Self-modeling and self-reflection of E-learning communities
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Data Quality in data-driven projects

  • 1. ( )Data ~Model) Quality in Qual) Quality in ity in in :Data Scien ce Projects: ,No Garbage In No Garbage Out 2019Zin ay in ida Ken s: che @ DataNatives 2019 DataNatives:
  • 2. /Dis: tin guis: hin g Rus: s: ian American tan ks:
  • 3.
  • 4. Preparin g data can be a big effort :Record l) Quality in in kage > -Fos: s: il) Quality in RACHEL - Han dbag :Col) Quality in our bl) Quality in ack 189,95 € Outer :material) Quality in L - eather :L - in in g Textil) Quality in e :Fas: ten in g Zip :Compartmen ts: Mobil) Quality in e phon e pocket Fos: s: il) Quality in Rachel) Quality in Tote bag l) Quality in eather bl) Quality in ack 7507001SKU# ZB7507001 ZB7507001 €141.75 on e s: l) Quality in ip pocket on the fron t s: l) Quality in ip pocket on the back cl) Quality in os: es: with zipper ( 22two l) Quality in eather han dl) Quality in es: han dl) Quality in e drop )cm -fittin gs: of gol) Quality in d col) Quality in oured metal) Quality in
  • 5. Some required checks: ● / , ,Feature’s: min max mean mos: t common val) Quality in ue ● ( )His: tograms: the ratio for each bag man ufacturer ● (Fraction of n ul) Quality in l) Quality in val) Quality in ues: the bag col) Quality in or mus: t be in at l) Quality in eas: t 90% )of en tries: to run recommen dation s: ● Is: the Cardin al) Quality in ity in ? ( )kn own to us: s: everal) Quality in outer material) Quality in s: ● An y in outl) Quality in iers: outs: ide n ormal) Quality in dis: tribution
  • 6. Fal) Quality in s: e dis: coveries: through mul) Quality in tipl) Quality in e hy in pothes: is: tes: tin g ,Sign ifican t n ot s: ign ifican t Not importan t
  • 7. qual) Quality in ity in check , ,Data cl) Quality in ean in g record l) Quality in in kage ,data profil) Quality in in g data s: tan dardis: in g Model) Quality in buil) Quality in din g Depl) Quality in oy in men t Mon itorin g Val) Quality in idation & Fixing Fixin g Model) Quality in Eval) Quality in uation & Fixing Experimen tation Tes: tin g modeldatamodel) Quality in Train in g data Tes: t data B7507001es: t model) Quality in Productive model) Quality in Tes: t data Servin g data Productive model) Quality in Machin e L - earn in g Pipel) Quality in in e ML - model) Quality in s:
  • 8. qual) Quality in ity in check , ,Data cl) Quality in ean in g record l) Quality in in kage ,data profil) Quality in in g data s: tan dardis: in g Model) Quality in buil) Quality in din g Depl) Quality in oy in men t Mon itorin g Val) Quality in idation & Fixing Fixin g Model) Quality in Eval) Quality in uation & Fixing Experimen tation Tes: tin g modeldatamodel) Quality in advan ced qual) Quality in ity in check Train in g data Tes: t data B7507001es: t model) Quality in Productive model) Quality in Tes: t data Servin g data Productive model) Quality in Machin e L - earn in g Pipel) Quality in in e Feature s: kew an d dis: tribution s: kew mon itorin g ML - model) Quality in s:
  • 9. ?What is: s: kew ● - :Feature bas: ed s: kew /Fin d train in g data s: l) Quality in ices: that l) Quality in ead to high l) Quality in ow model) Quality in performan ce B7507001ags: from imitate leather le leatherathe leatherr are n ot recommen ded to cus: tomers: => imitate leather le leatherathe leatherr was: n ot con s: idered in the train in g data ● - :Dis: tribution bas: ed s: kew ?Are there an y in deviation s: between train in g an d s: ervin g data Meas: ure dis: tribution dis: tan ces: us: in g , -cos: in e s: imil) Quality in arity in Kol) Quality in mogorov Smirn ov , , .dis: tan ce KL - divergen ce etc
  • 10. Al) Quality in ert → Action Action Data qual) Quality in ity in das: hboard Features: Dis: tribution An omal) Quality in y in Al) Quality in erts: ?B7507001ugs: in Data Acquis: ition or In ges: tion ? -Probl) Quality in ems: with s: ource data RPC timeout
  • 11. Model) Quality in s: don ’t an s: wer un as: ked ques: tion s: ● =Imitate l) Quality in eather L - eatherette ● The age metric is: chan ged – from days to hours from day in s: to hours: ● New features: rel) Quality in evan t for recommen dation s:
  • 12. /Metadata for Data Model) Quality in Qual) Quality in ity in Model) Quality in buil) Quality in din g ,Depl) Quality in oy in men t ,Mon itorin g Val) Quality in idation & Fixing Fixin g Model) Quality in Eval) Quality in uation & Fixing Experimen tation Tes: tin g modeldata Train in g data Tes: t data Productive model) Quality in Gen erated data Servin g data Productive model) Quality in Metadata Check for deviation s: ML - model) Quality in s: B7507001es: t model) Quality in
  • 13. Havin g the right data is: crucial) Quality in
  • 14. Referen ces: ● , ,Dan il) Quality in o Sato Arif Wider Chris: toph Win dheus: er “Con tin uous: Del) Quality in ivery in for Machin e L - earn in g” ● , . , ,Al) Quality in kis: Pol) Quality in y in zotis: Martin A Zin kevich Steven Whan g Sudip Roy in “Data , 2017Man agemen t Chal) Quality in l) Quality in en ges: in Production Machin e L - earn in g “ ICMD ● , , , , ,Eric B7507001reck Neokl) Quality in is: Pol) Quality in y in zotis: Sudip Roy in Steven Euijon g Whan g Martin , 19Zin kevich “Data val) Quality in idation for machin e l) Quality in earn in g” Sy in s: ML - ’ ● :// . . / / - - - -https: www s: cien tificamerican com articl) Quality in e how a machin e l) Quality in earn s: /prejudice