Many companies struggle to get some business knowledge from their data. They know they have data, but then what? They even know what to extract from the data, but guess what happens? Garbage in - garbage out. ETL is a known approach for transforming data, but it has a number of drawbacks. One should also know the further reviews of it like ECTL and ELT. We will go through them and also discuss the usual problems with starting data processing in different industries. In this talk, Piotr wants to show the typical mistakes companies make when trying to build their data warehouse, and why small companies fail to even start building their business on their data.
23. Transform
● Selecting only certain columns to load
● Translating coded values/enumerations
● Encoding free-form values
● Deriving a new calculated value
● Sorting or ordering the data
● Joining data from multiple sources
● De-duplicating the data
● Aggregating
● Generating surrogate-key values
● Transposing or pivoting
● Splitting a column into multiple columns
● Validating the data by referential tables
● Applying any form of data validation
24. Clean Transform
● Encoding free-form values
● Selecting only certain columns to load
● Translating coded values/enumerations
● Joining data from multiple sources
● De-duplicating the data
● Validating the data by referential tables
● Applying any form of data validation
● Deriving a new calculated value
● Sorting or ordering the data
● Aggregating
● Generating surrogate-key values
● Transposing or pivoting
● Splitting a column into multiple columns
33. Final Thoughts
● Keep in mind Hierarchy of needs
● Use ELCT rather than ETL
● Always:
34. Final Thoughts
● Keep in mind Hierarchy of needs
● Use ELCT rather than ETL
● Always:
○ Deduplicate data
35. Final Thoughts
● Keep in mind Hierarchy of needs
● Use ELCT rather than ETL
● Always:
○ Deduplicate data
○ Detect outliers in data
36. Final Thoughts
● Keep in mind Hierarchy of needs
● Use ELCT rather than ETL
● Always:
○ Deduplicate data
○ Detect outliers in data
○ Gather metrics about data