2. Elinaldo Monteiro
Developer at Jus.com.br
● 8 years web development
● Python/Go/Ruby/Php/Java/Javascript
● @elinaldosoft
● elinaldo@jus.com.br
3. Jus.com.br
Surgido há 20 anos com a proposta de tornar mais acessível o
Direito, o Jus tornou-se referência pioneira na internet para
profissionais, estudantes e todos os interessados por assuntos
jurídicos.
4. We are drowning in information and starving for knowledge.
Rutherford D. Roger (1915 - 2015)
8. Extract
The first part of an ETL process involves extracting the
data from the source system(s).
CSV, Relational Databases, XML, .txt, JSON, Non-
Relational, XLS, Web Sites (Crawlers), Images
(OCR), APIs, Logs
9. Transform
In the data transformation stage, a series of rules or
functions are applied to the extracted data in order to
prepare it for loading into the end target. Some data
does not require any transformation at all; such data is
known as "direct move" or "pass through" data.
10. Parse, pipeline
1. Translating coded values: (e.g., mapping "Male" to "M")
2. Deriving a new calculated value: (e.g., sale_amount =
qty * unit_price)
3. Sorting or ordering the data based on a list of columns to
improve search performance
4. Joining data from multiple sources (e.g., lookup, merge)
and deduplicating the data
11. Load
The load phase loads the data into the end target,
which may be a simple delimited flat file or a data
warehouse, API.