SlideShare a Scribd company logo
1 of 14
ETL
Se queres prever o futuro, estuda o passado - Confúcio
Python Brasil 14
Elinaldo Monteiro
Developer at Jus.com.br
● 8 years web development
● Python/Go/Ruby/Php/Java/Javascript
● @elinaldosoft
● elinaldo@jus.com.br
Jus.com.br
Surgido há 20 anos com a proposta de tornar mais acessível o
Direito, o Jus tornou-se referência pioneira na internet para
profissionais, estudantes e todos os interessados por assuntos
jurídicos.
We are drowning in information and starving for knowledge.
Rutherford D. Roger (1915 - 2015)
1. Memcache (2003)
2. Wordpress (2003)
3. Rails (2004)
4. Nginx (2004)
5. Django (2005)
6. Jquery (2006)
7. Redis (2009)
8. Mongo (2009)
9. Bootstrap (2014)
Lei de Moore
What is ETL ?
1. Extract
2. Transform
3. Load
Extract
The first part of an ETL process involves extracting the
data from the source system(s).
CSV, Relational Databases, XML, .txt, JSON, Non-
Relational, XLS, Web Sites (Crawlers), Images
(OCR), APIs, Logs
Transform
In the data transformation stage, a series of rules or
functions are applied to the extracted data in order to
prepare it for loading into the end target. Some data
does not require any transformation at all; such data is
known as "direct move" or "pass through" data.
Parse, pipeline
1. Translating coded values: (e.g., mapping "Male" to "M")
2. Deriving a new calculated value: (e.g., sale_amount =
qty * unit_price)
3. Sorting or ordering the data based on a list of columns to
improve search performance
4. Joining data from multiple sources (e.g., lookup, merge)
and deduplicating the data
Load
The load phase loads the data into the end target,
which may be a simple delimited flat file or a data
warehouse, API.
What to do with it?
Any question?

More Related Content

Similar to Etl

Baby.com.br: Analisando, adaptando e melhorando a arquitetura da informação e...
Baby.com.br: Analisando, adaptando e melhorando a arquitetura da informação e...Baby.com.br: Analisando, adaptando e melhorando a arquitetura da informação e...
Baby.com.br: Analisando, adaptando e melhorando a arquitetura da informação e...Eduardo Shiota Yasuda
 
How to develop a data scientist – What business has requested v02
How to develop a data scientist – What business has requested v02How to develop a data scientist – What business has requested v02
How to develop a data scientist – What business has requested v02Data Science London
 
Lee Feigenbaum Presentation
Lee Feigenbaum PresentationLee Feigenbaum Presentation
Lee Feigenbaum PresentationMediabistro
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
 
Open Source India Tech Days 2009
Open Source India Tech Days 2009Open Source India Tech Days 2009
Open Source India Tech Days 2009Shayon Pal
 
CV-Surya Prajith
CV-Surya PrajithCV-Surya Prajith
CV-Surya PrajithSurya Mohan
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)wesley chun
 
Yoast SEO for TYPO3 - TYPO3 Developer Days 2017
Yoast SEO for TYPO3 - TYPO3 Developer Days 2017Yoast SEO for TYPO3 - TYPO3 Developer Days 2017
Yoast SEO for TYPO3 - TYPO3 Developer Days 2017Richard Haeser
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE
 
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems [Ja...
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems [Ja...Java EE 6 Adoption in One of the World’s Largest Online Financial Systems [Ja...
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems [Ja...Hirofumi Iwasaki
 
Open Data in the UK - API Days 2016
Open Data in the UK - API Days 2016Open Data in the UK - API Days 2016
Open Data in the UK - API Days 2016senakafdo
 
2014 CrossRef Annual Meeting: Strategic Update
2014 CrossRef Annual Meeting: Strategic Update2014 CrossRef Annual Meeting: Strategic Update
2014 CrossRef Annual Meeting: Strategic UpdateCrossref
 
GDPR and EA - Commissioning a web site Part 4. The nature of the web
GDPR and EA - Commissioning a web site Part 4. The nature of the webGDPR and EA - Commissioning a web site Part 4. The nature of the web
GDPR and EA - Commissioning a web site Part 4. The nature of the webAllen Woods
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 

Similar to Etl (20)

Baby.com.br: Analisando, adaptando e melhorando a arquitetura da informação e...
Baby.com.br: Analisando, adaptando e melhorando a arquitetura da informação e...Baby.com.br: Analisando, adaptando e melhorando a arquitetura da informação e...
Baby.com.br: Analisando, adaptando e melhorando a arquitetura da informação e...
 
Html1
Html1Html1
Html1
 
How to develop a data scientist – What business has requested v02
How to develop a data scientist – What business has requested v02How to develop a data scientist – What business has requested v02
How to develop a data scientist – What business has requested v02
 
Lee Feigenbaum Presentation
Lee Feigenbaum PresentationLee Feigenbaum Presentation
Lee Feigenbaum Presentation
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
Open Source India Tech Days 2009
Open Source India Tech Days 2009Open Source India Tech Days 2009
Open Source India Tech Days 2009
 
Gdpr For Nerds
Gdpr For NerdsGdpr For Nerds
Gdpr For Nerds
 
CV-Surya Prajith
CV-Surya PrajithCV-Surya Prajith
CV-Surya Prajith
 
Content Management and Collaboration
Content Management and CollaborationContent Management and Collaboration
Content Management and Collaboration
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)
 
Yoast SEO for TYPO3 - TYPO3 Developer Days 2017
Yoast SEO for TYPO3 - TYPO3 Developer Days 2017Yoast SEO for TYPO3 - TYPO3 Developer Days 2017
Yoast SEO for TYPO3 - TYPO3 Developer Days 2017
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
 
Startups & Entrepreneurship
Startups & EntrepreneurshipStartups & Entrepreneurship
Startups & Entrepreneurship
 
The public sector and integrated operations
The public sector and integrated operationsThe public sector and integrated operations
The public sector and integrated operations
 
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems [Ja...
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems [Ja...Java EE 6 Adoption in One of the World’s Largest Online Financial Systems [Ja...
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems [Ja...
 
Open Data in the UK - API Days 2016
Open Data in the UK - API Days 2016Open Data in the UK - API Days 2016
Open Data in the UK - API Days 2016
 
2014 15 IT trend
2014 15 IT trend2014 15 IT trend
2014 15 IT trend
 
2014 CrossRef Annual Meeting: Strategic Update
2014 CrossRef Annual Meeting: Strategic Update2014 CrossRef Annual Meeting: Strategic Update
2014 CrossRef Annual Meeting: Strategic Update
 
GDPR and EA - Commissioning a web site Part 4. The nature of the web
GDPR and EA - Commissioning a web site Part 4. The nature of the webGDPR and EA - Commissioning a web site Part 4. The nature of the web
GDPR and EA - Commissioning a web site Part 4. The nature of the web
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 

Recently uploaded

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 

Recently uploaded (20)

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 

Etl

  • 1. ETL Se queres prever o futuro, estuda o passado - Confúcio Python Brasil 14
  • 2. Elinaldo Monteiro Developer at Jus.com.br ● 8 years web development ● Python/Go/Ruby/Php/Java/Javascript ● @elinaldosoft ● elinaldo@jus.com.br
  • 3. Jus.com.br Surgido há 20 anos com a proposta de tornar mais acessível o Direito, o Jus tornou-se referência pioneira na internet para profissionais, estudantes e todos os interessados por assuntos jurídicos.
  • 4. We are drowning in information and starving for knowledge. Rutherford D. Roger (1915 - 2015)
  • 5. 1. Memcache (2003) 2. Wordpress (2003) 3. Rails (2004) 4. Nginx (2004) 5. Django (2005) 6. Jquery (2006) 7. Redis (2009) 8. Mongo (2009) 9. Bootstrap (2014)
  • 7. What is ETL ? 1. Extract 2. Transform 3. Load
  • 8. Extract The first part of an ETL process involves extracting the data from the source system(s). CSV, Relational Databases, XML, .txt, JSON, Non- Relational, XLS, Web Sites (Crawlers), Images (OCR), APIs, Logs
  • 9. Transform In the data transformation stage, a series of rules or functions are applied to the extracted data in order to prepare it for loading into the end target. Some data does not require any transformation at all; such data is known as "direct move" or "pass through" data.
  • 10. Parse, pipeline 1. Translating coded values: (e.g., mapping "Male" to "M") 2. Deriving a new calculated value: (e.g., sale_amount = qty * unit_price) 3. Sorting or ordering the data based on a list of columns to improve search performance 4. Joining data from multiple sources (e.g., lookup, merge) and deduplicating the data
  • 11. Load The load phase loads the data into the end target, which may be a simple delimited flat file or a data warehouse, API.
  • 12.
  • 13. What to do with it?