Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Slides - Summary of: "Automating Data Preparation: Can We? Should We? Must We?"
1. Summary of “Automating Data
Preparation: Can We? Should We?
Must We?”
N. Paton (2019), “Automating Data
Preparation: Can We? Should We?
Must We?”
UNIVERSITÀ DEGLI STUDI DI TRIESTE, Dipartimento di ingegneria e architettura, Corso di laurea triennale in Ingegneria Elettronica e Informatica
Laureando: Samuele Bertollo Anno Accademico 2019/2020 Relatore: prof. Eric Medvet
2. 2
Introduction: Data Preparation
●
Discovery, selection, integration and cleaning of
existing data sets into a form that is suitable for
analysis
●
Done manually and divided into steps
●
Automation principle: to specify what they want to
obtain instead of how to obtain it
3. 3
The problem: automating data
preparation
●
What techniques do we have to automate?
●
How differ the quality of the results in manual and
automated approaches?
●
When we must automate?
4. 4
Why it is relevant?
●
Time
●
Cost
●
Manual approach is not viable in some cases
5. 5
What techniques do we have to
automate?
●
Strategies:
1)Single steps:
2)End-to-end problem
●
Need of evidence:
–
The more the better
–
Data transformation (single-step) example
6. 6
Comparing quality of the results in
manual and automated approaches
●
Different situations different results
●
Data Warehouse task: manual probably will remain relevant
●
Data lakes tasks: few positive findings on automatic single
steps
●
End-to-end automation or step by step?
7. 7
When we must automate?
●
Big data
●
No economic or human resources
●
Some steps are hard to solve manually
8. 8
Conclusion
●
Big data will become more common, so
automation will gain importance
●
In some cases we must automate
9. 9
Further research
●
Comparison of quality of results between
automatic and manual approaches
●
End-to-end
●
Automating all the different data preparation steps
and changing the evidence used