SlideShare a Scribd company logo
1 of 9
Download to read offline
Summary of “Automating Data
Preparation: Can We? Should We?
Must We?”
N. Paton (2019), “Automating Data
Preparation: Can We? Should We?
Must We?”
UNIVERSITÀ DEGLI STUDI DI TRIESTE, Dipartimento di ingegneria e architettura, Corso di laurea triennale in Ingegneria Elettronica e Informatica
Laureando: Samuele Bertollo Anno Accademico 2019/2020 Relatore: prof. Eric Medvet
2
Introduction: Data Preparation
●
Discovery, selection, integration and cleaning of
existing data sets into a form that is suitable for
analysis
●
Done manually and divided into steps
●
Automation principle: to specify what they want to
obtain instead of how to obtain it
3
The problem: automating data
preparation
●
What techniques do we have to automate?
●
How differ the quality of the results in manual and
automated approaches?
●
When we must automate?
4
Why it is relevant?
●
Time
●
Cost
●
Manual approach is not viable in some cases
5
What techniques do we have to
automate?
●
Strategies:
1)Single steps:
2)End-to-end problem
●
Need of evidence:
–
The more the better
–
Data transformation (single-step) example
6
Comparing quality of the results in
manual and automated approaches
●
Different situations different results
●
Data Warehouse task: manual probably will remain relevant
●
Data lakes tasks: few positive findings on automatic single
steps
●
End-to-end automation or step by step?
7
When we must automate?
●
Big data
●
No economic or human resources
●
Some steps are hard to solve manually
8
Conclusion
●
Big data will become more common, so
automation will gain importance
●
In some cases we must automate
9
Further research
●
Comparison of quality of results between
automatic and manual approaches
●
End-to-end
●
Automating all the different data preparation steps
and changing the evidence used

More Related Content

What's hot

Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data ScienceFrank Kienle
 
NUMERICAL METHOD AND ITS APPLICATION
NUMERICAL METHOD AND ITS APPLICATIONNUMERICAL METHOD AND ITS APPLICATION
NUMERICAL METHOD AND ITS APPLICATIONREZAUL KARIM REFATH
 
Dotnet maximum likelihood estimation from uncertain data in the belief funct...
Dotnet  maximum likelihood estimation from uncertain data in the belief funct...Dotnet  maximum likelihood estimation from uncertain data in the belief funct...
Dotnet maximum likelihood estimation from uncertain data in the belief funct...Ecway Technologies
 
Data Science, Knowledge Discover, Mining and Learning
Data Science, Knowledge Discover, Mining and LearningData Science, Knowledge Discover, Mining and Learning
Data Science, Knowledge Discover, Mining and LearningEUBrasilCloudFORUM .
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecway Technologies
 

What's hot (6)

Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data Science
 
NUMERICAL METHOD AND ITS APPLICATION
NUMERICAL METHOD AND ITS APPLICATIONNUMERICAL METHOD AND ITS APPLICATION
NUMERICAL METHOD AND ITS APPLICATION
 
Dotnet maximum likelihood estimation from uncertain data in the belief funct...
Dotnet  maximum likelihood estimation from uncertain data in the belief funct...Dotnet  maximum likelihood estimation from uncertain data in the belief funct...
Dotnet maximum likelihood estimation from uncertain data in the belief funct...
 
Artemenko-poster
Artemenko-posterArtemenko-poster
Artemenko-poster
 
Data Science, Knowledge Discover, Mining and Learning
Data Science, Knowledge Discover, Mining and LearningData Science, Knowledge Discover, Mining and Learning
Data Science, Knowledge Discover, Mining and Learning
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 

Similar to Slides - Summary of: "Automating Data Preparation: Can We? Should We? Must We?"

Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicData Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicInstitute of Contemporary Sciences
 
Internship report on AI , ML & IIOT and project responses
Internship report on AI , ML & IIOT and project responsesInternship report on AI , ML & IIOT and project responses
Internship report on AI , ML & IIOT and project responsesRakesh Arigela
 
Using GradeMark For Effective Feedback
Using GradeMark For Effective FeedbackUsing GradeMark For Effective Feedback
Using GradeMark For Effective FeedbackKarl Luke
 
K-12 Computing Education for the AI Era: From Data Literacy to Data Agency
K-12 Computing Education for the AI Era: From Data Literacy to Data AgencyK-12 Computing Education for the AI Era: From Data Literacy to Data Agency
K-12 Computing Education for the AI Era: From Data Literacy to Data AgencyHenriikka Vartiainen
 
Course Outline Sep 2021 Trimester.pptx
Course Outline Sep 2021 Trimester.pptxCourse Outline Sep 2021 Trimester.pptx
Course Outline Sep 2021 Trimester.pptxMobin26
 
Validity of a graph-based automatic assessment system for programming assign...
Validity of a graph-based automatic assessment system for  programming assign...Validity of a graph-based automatic assessment system for  programming assign...
Validity of a graph-based automatic assessment system for programming assign...IJECEIAES
 
A Context-aware Model for the Analysis of User Interaction and QoE in Mobile ...
A Context-aware Model for the Analysis of User Interaction and QoE in Mobile ...A Context-aware Model for the Analysis of User Interaction and QoE in Mobile ...
A Context-aware Model for the Analysis of User Interaction and QoE in Mobile ...Pedro Luis Mateo Navarro
 
Overview and Importance of Data Quality for Machine Learning Tasks
Overview and Importance of Data Quality for Machine Learning TasksOverview and Importance of Data Quality for Machine Learning Tasks
Overview and Importance of Data Quality for Machine Learning TasksHima Patel
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
Teaching Data-driven Video Processing via Crowdsourced Data Collection
Teaching Data-driven Video Processing via Crowdsourced Data CollectionTeaching Data-driven Video Processing via Crowdsourced Data Collection
Teaching Data-driven Video Processing via Crowdsourced Data CollectionMatthias Trapp
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Andreas Metzger
 
An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningIRJET Journal
 
IRJET- Predicting Academic Course Preference using Inspired Mapreduce
IRJET- Predicting Academic Course Preference using Inspired MapreduceIRJET- Predicting Academic Course Preference using Inspired Mapreduce
IRJET- Predicting Academic Course Preference using Inspired MapreduceIRJET Journal
 
Computer-Supported Collaborative Learning with Mind-Maps
Computer-Supported Collaborative Learning with Mind-MapsComputer-Supported Collaborative Learning with Mind-Maps
Computer-Supported Collaborative Learning with Mind-MapsGeorgiy Gerkushenko
 
IRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining TechniquesIRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining TechniquesIRJET Journal
 
Pydata Chicago - work hard once
Pydata Chicago - work hard oncePydata Chicago - work hard once
Pydata Chicago - work hard onceJi Dong
 
CRITERION BASED AUTOMATIC GENERATION OF QUESTION PAPER
CRITERION BASED AUTOMATIC GENERATION OF QUESTION PAPERCRITERION BASED AUTOMATIC GENERATION OF QUESTION PAPER
CRITERION BASED AUTOMATIC GENERATION OF QUESTION PAPERvivatechijri
 
Auto8 computerschemicaleng.
Auto8 computerschemicaleng.Auto8 computerschemicaleng.
Auto8 computerschemicaleng.William
 

Similar to Slides - Summary of: "Automating Data Preparation: Can We? Should We? Must We?" (20)

Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicData Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
 
Internship report on AI , ML & IIOT and project responses
Internship report on AI , ML & IIOT and project responsesInternship report on AI , ML & IIOT and project responses
Internship report on AI , ML & IIOT and project responses
 
Using GradeMark For Effective Feedback
Using GradeMark For Effective FeedbackUsing GradeMark For Effective Feedback
Using GradeMark For Effective Feedback
 
K-12 Computing Education for the AI Era: From Data Literacy to Data Agency
K-12 Computing Education for the AI Era: From Data Literacy to Data AgencyK-12 Computing Education for the AI Era: From Data Literacy to Data Agency
K-12 Computing Education for the AI Era: From Data Literacy to Data Agency
 
Course Outline Sep 2021 Trimester.pptx
Course Outline Sep 2021 Trimester.pptxCourse Outline Sep 2021 Trimester.pptx
Course Outline Sep 2021 Trimester.pptx
 
Validity of a graph-based automatic assessment system for programming assign...
Validity of a graph-based automatic assessment system for  programming assign...Validity of a graph-based automatic assessment system for  programming assign...
Validity of a graph-based automatic assessment system for programming assign...
 
A Context-aware Model for the Analysis of User Interaction and QoE in Mobile ...
A Context-aware Model for the Analysis of User Interaction and QoE in Mobile ...A Context-aware Model for the Analysis of User Interaction and QoE in Mobile ...
A Context-aware Model for the Analysis of User Interaction and QoE in Mobile ...
 
Overview and Importance of Data Quality for Machine Learning Tasks
Overview and Importance of Data Quality for Machine Learning TasksOverview and Importance of Data Quality for Machine Learning Tasks
Overview and Importance of Data Quality for Machine Learning Tasks
 
Promoting computer knowledge among D.T.Ed students
Promoting computer knowledge among D.T.Ed studentsPromoting computer knowledge among D.T.Ed students
Promoting computer knowledge among D.T.Ed students
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
Teaching Data-driven Video Processing via Crowdsourced Data Collection
Teaching Data-driven Video Processing via Crowdsourced Data CollectionTeaching Data-driven Video Processing via Crowdsourced Data Collection
Teaching Data-driven Video Processing via Crowdsourced Data Collection
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
 
An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine Learning
 
IRJET- Predicting Academic Course Preference using Inspired Mapreduce
IRJET- Predicting Academic Course Preference using Inspired MapreduceIRJET- Predicting Academic Course Preference using Inspired Mapreduce
IRJET- Predicting Academic Course Preference using Inspired Mapreduce
 
Computer-Supported Collaborative Learning with Mind-Maps
Computer-Supported Collaborative Learning with Mind-MapsComputer-Supported Collaborative Learning with Mind-Maps
Computer-Supported Collaborative Learning with Mind-Maps
 
BPMinDIT-Increasing control in construction processes
BPMinDIT-Increasing control in construction processesBPMinDIT-Increasing control in construction processes
BPMinDIT-Increasing control in construction processes
 
IRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining TechniquesIRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining Techniques
 
Pydata Chicago - work hard once
Pydata Chicago - work hard oncePydata Chicago - work hard once
Pydata Chicago - work hard once
 
CRITERION BASED AUTOMATIC GENERATION OF QUESTION PAPER
CRITERION BASED AUTOMATIC GENERATION OF QUESTION PAPERCRITERION BASED AUTOMATIC GENERATION OF QUESTION PAPER
CRITERION BASED AUTOMATIC GENERATION OF QUESTION PAPER
 
Auto8 computerschemicaleng.
Auto8 computerschemicaleng.Auto8 computerschemicaleng.
Auto8 computerschemicaleng.
 

Recently uploaded

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Recently uploaded (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Slides - Summary of: "Automating Data Preparation: Can We? Should We? Must We?"

  • 1. Summary of “Automating Data Preparation: Can We? Should We? Must We?” N. Paton (2019), “Automating Data Preparation: Can We? Should We? Must We?” UNIVERSITÀ DEGLI STUDI DI TRIESTE, Dipartimento di ingegneria e architettura, Corso di laurea triennale in Ingegneria Elettronica e Informatica Laureando: Samuele Bertollo Anno Accademico 2019/2020 Relatore: prof. Eric Medvet
  • 2. 2 Introduction: Data Preparation ● Discovery, selection, integration and cleaning of existing data sets into a form that is suitable for analysis ● Done manually and divided into steps ● Automation principle: to specify what they want to obtain instead of how to obtain it
  • 3. 3 The problem: automating data preparation ● What techniques do we have to automate? ● How differ the quality of the results in manual and automated approaches? ● When we must automate?
  • 4. 4 Why it is relevant? ● Time ● Cost ● Manual approach is not viable in some cases
  • 5. 5 What techniques do we have to automate? ● Strategies: 1)Single steps: 2)End-to-end problem ● Need of evidence: – The more the better – Data transformation (single-step) example
  • 6. 6 Comparing quality of the results in manual and automated approaches ● Different situations different results ● Data Warehouse task: manual probably will remain relevant ● Data lakes tasks: few positive findings on automatic single steps ● End-to-end automation or step by step?
  • 7. 7 When we must automate? ● Big data ● No economic or human resources ● Some steps are hard to solve manually
  • 8. 8 Conclusion ● Big data will become more common, so automation will gain importance ● In some cases we must automate
  • 9. 9 Further research ● Comparison of quality of results between automatic and manual approaches ● End-to-end ● Automating all the different data preparation steps and changing the evidence used