SlideShare a Scribd company logo
1 of 1
Download to read offline
10 STEPS FOR
HIGH-QUALITY
DATASETS
BY PIER GIUSEPPE DE MEO
#1
Keep your Datasets separate.
#2
Prepare a toolbox with a set of transformation processes (procedures, functions,
scripts, etc.) that can be reused.
#3
Logically group the types of transformations, based on categories (e.g. missing
values, decodes, normalization, etc.).
#4
For every category identified, select a subset of data in a Dataset on which to apply
this type of transformation: repeat this process on all your Datasets separately.
#5
For every Dataset, if needed, enrich the data contained with other derived
information (e.g. calculated field, extraction of sub-information, etc.).
#6
Define the minimum level of details shared across all Datasets (e.g. single
transaction per day, groups of transactions per month, etc.).
#7
For every Dataset, groups data at the same level of granularity.
#8
Join all formatted Datasets in a single Master Dataset, based on granularity defined.
#9
In the Master Dataset produced, check whether there exists a subset of data on
which to apply any of the transformations in the toolbox.
#10
In the Master Dataset produced, if needed, enrich the data with some extra
information (e.g. metrics from various Datasets combined to form a KPI,
decryption based on a combination of fields, etc.).
Knowledge
Share
Series 1
DATASETS
A "Divide et impera" approach in producing high-quality
Datasets for data analysts.

More Related Content

Similar to 10 Steps to High-Quality Datasets

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesInformaticaTrainingClasses
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf pointsdba3003
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf pointsocporacledba
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionTechnical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionNetApp
 
Improving Association Rule Mining by Defining a Novel Data Structure
Improving Association Rule Mining by Defining a Novel Data StructureImproving Association Rule Mining by Defining a Novel Data Structure
Improving Association Rule Mining by Defining a Novel Data StructureIRJET Journal
 
Fp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalabilityFp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalabilityDr.Manmohan Singh
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxshruthisweety4
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.pptpadalamail
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoDave Stokes
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningNandakumar P
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3Parviz Vakili
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018Dave Stokes
 

Similar to 10 Steps to High-Quality Datasets (20)

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
DBMS.pptx
DBMS.pptxDBMS.pptx
DBMS.pptx
 
5 v of big data
5 v of big data5 v of big data
5 v of big data
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionTechnical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
 
Noha mega store
Noha mega storeNoha mega store
Noha mega store
 
ifip2008albashiri.pdf
ifip2008albashiri.pdfifip2008albashiri.pdf
ifip2008albashiri.pdf
 
Improving Association Rule Mining by Defining a Novel Data Structure
Improving Association Rule Mining by Defining a Novel Data StructureImproving Association Rule Mining by Defining a Novel Data Structure
Improving Association Rule Mining by Defining a Novel Data Structure
 
Fp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalabilityFp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalability
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 

More from Pier Giuseppe De Meo

10 Steps to build a periodic summary statistical report
10 Steps to build a periodic summary statistical report10 Steps to build a periodic summary statistical report
10 Steps to build a periodic summary statistical reportPier Giuseppe De Meo
 
Bilancio Demografico Nazionale del 2019
Bilancio Demografico Nazionale del 2019Bilancio Demografico Nazionale del 2019
Bilancio Demografico Nazionale del 2019Pier Giuseppe De Meo
 
10 Steps for Managing Cross-System Data Mapping.pdf
10 Steps for Managing Cross-System Data Mapping.pdf10 Steps for Managing Cross-System Data Mapping.pdf
10 Steps for Managing Cross-System Data Mapping.pdfPier Giuseppe De Meo
 
10 Passi per Set di Dati di Alta-Qualità
10 Passi per Set di Dati di Alta-Qualità10 Passi per Set di Dati di Alta-Qualità
10 Passi per Set di Dati di Alta-QualitàPier Giuseppe De Meo
 
EDW: Enterprise Data Warehouse - Architecture and Process
EDW:  Enterprise Data Warehouse - Architecture and ProcessEDW:  Enterprise Data Warehouse - Architecture and Process
EDW: Enterprise Data Warehouse - Architecture and ProcessPier Giuseppe De Meo
 
10 passi per la costruzione di un report statistico di sintesi periodico
10 passi per la costruzione di un report statistico di sintesi periodico10 passi per la costruzione di un report statistico di sintesi periodico
10 passi per la costruzione di un report statistico di sintesi periodicoPier Giuseppe De Meo
 
10 Passi per la gestione del Mapping dei Dati cross-sistema
10 Passi per la gestione del Mapping dei Dati cross-sistema10 Passi per la gestione del Mapping dei Dati cross-sistema
10 Passi per la gestione del Mapping dei Dati cross-sistemaPier Giuseppe De Meo
 
BES 2018 - La Soddisfazione sul Lavoro
BES 2018 - La Soddisfazione sul LavoroBES 2018 - La Soddisfazione sul Lavoro
BES 2018 - La Soddisfazione sul LavoroPier Giuseppe De Meo
 

More from Pier Giuseppe De Meo (9)

10 Steps to build a periodic summary statistical report
10 Steps to build a periodic summary statistical report10 Steps to build a periodic summary statistical report
10 Steps to build a periodic summary statistical report
 
Bilancio Demografico Nazionale del 2019
Bilancio Demografico Nazionale del 2019Bilancio Demografico Nazionale del 2019
Bilancio Demografico Nazionale del 2019
 
10 Steps for Managing Cross-System Data Mapping.pdf
10 Steps for Managing Cross-System Data Mapping.pdf10 Steps for Managing Cross-System Data Mapping.pdf
10 Steps for Managing Cross-System Data Mapping.pdf
 
10 Passi per Set di Dati di Alta-Qualità
10 Passi per Set di Dati di Alta-Qualità10 Passi per Set di Dati di Alta-Qualità
10 Passi per Set di Dati di Alta-Qualità
 
EDW: Enterprise Data Warehouse - Architecture and Process
EDW:  Enterprise Data Warehouse - Architecture and ProcessEDW:  Enterprise Data Warehouse - Architecture and Process
EDW: Enterprise Data Warehouse - Architecture and Process
 
10 passi per la costruzione di un report statistico di sintesi periodico
10 passi per la costruzione di un report statistico di sintesi periodico10 passi per la costruzione di un report statistico di sintesi periodico
10 passi per la costruzione di un report statistico di sintesi periodico
 
Covid19 20200406
Covid19 20200406Covid19 20200406
Covid19 20200406
 
10 Passi per la gestione del Mapping dei Dati cross-sistema
10 Passi per la gestione del Mapping dei Dati cross-sistema10 Passi per la gestione del Mapping dei Dati cross-sistema
10 Passi per la gestione del Mapping dei Dati cross-sistema
 
BES 2018 - La Soddisfazione sul Lavoro
BES 2018 - La Soddisfazione sul LavoroBES 2018 - La Soddisfazione sul Lavoro
BES 2018 - La Soddisfazione sul Lavoro
 

Recently uploaded

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 

Recently uploaded (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 

10 Steps to High-Quality Datasets

  • 1. 10 STEPS FOR HIGH-QUALITY DATASETS BY PIER GIUSEPPE DE MEO #1 Keep your Datasets separate. #2 Prepare a toolbox with a set of transformation processes (procedures, functions, scripts, etc.) that can be reused. #3 Logically group the types of transformations, based on categories (e.g. missing values, decodes, normalization, etc.). #4 For every category identified, select a subset of data in a Dataset on which to apply this type of transformation: repeat this process on all your Datasets separately. #5 For every Dataset, if needed, enrich the data contained with other derived information (e.g. calculated field, extraction of sub-information, etc.). #6 Define the minimum level of details shared across all Datasets (e.g. single transaction per day, groups of transactions per month, etc.). #7 For every Dataset, groups data at the same level of granularity. #8 Join all formatted Datasets in a single Master Dataset, based on granularity defined. #9 In the Master Dataset produced, check whether there exists a subset of data on which to apply any of the transformations in the toolbox. #10 In the Master Dataset produced, if needed, enrich the data with some extra information (e.g. metrics from various Datasets combined to form a KPI, decryption based on a combination of fields, etc.). Knowledge Share Series 1 DATASETS A "Divide et impera" approach in producing high-quality Datasets for data analysts.