SlideShare a Scribd company logo
Introduction to ETL
by
Maira Bay de Souza
About the Author
Maira Bay de Souza, BSc. Comp.
Science
Working with
● software testing
● software development
since 2001
Talend ETL developer and tester since
2013
IBM, HP, SunLife, small businesses
What is ETL?
● Extract, Transform, Load
● Sequence of operations on the same dataset
● Sometimes joining datasets together in T
● Simple Transformations may be done in E, L
Extract
Read any kind of static data source:
● Extract data from a website (HTML, JSON,
RSS, etc)
● Read files from a server (FTP, SCP, etc)
● Query a RESTful API
● Read from a database
● Read from a cloud storage unit: GoogleDrive,
GoogleStorage, AWS, DropBox, etc
● Read data from common business applications:
SAP, SalesForce, SugarCRM, etc
Transform
Make operations on data as a whole:
● Split names into first, middle, last
● Filter out people with blank addresses
● Sort employees by % of sales target achived
● Join data from an excel file and a database
● Find duplicate names using Levenstein
● Normalize or denormalize list of addresses
● Split postal code based on Regex
● Validate XML with XSD
Load
Output data in any kind of format:
● Save a CSV, XML, etc
● Insert or Update a table in a database
● Send a file in an email
● Make a JSON available through a RESTful API
● Save data on a cloud storage unit:GoogleDrive,
GoogleStorage, AWS, DropBox, etc
● Save data on common business applications:
SAP, SalesForce, SugarCRM, etc
Example applications
● Find twitter followers who are not facebook
followers and make their names and logins
available on a JSON via RESTful API
● Join employee names from HR database with
sales records from CRM and send weekly
email to CMO with names and progress
towards sales target
Difference between ETL and
WebApp
WebApp
● Reads one or more user inputs
or actions:
– forms filled
– button clicked
– etc
● Produces a result:
– page updated
– page loaded
– etc
ETL
● Reads one or more data inputs:
– table from database
– pages from RSS feed
– etc
● Produces another data output or
action:
– send email
– create Jasper Report
– etc
Tools
Tools:
1)Talend Open Studio
2)Pentaho Spoon/PDI (previously Kettle)
Features:
1)Free
2)Easy-to-use
3)Powerful
Demo
Live Demo:
creating a Talend ETL job
Questions/Requests
Questions???
I'd be glad to answer any of your ETL questions and requests. Click
here to schedule a complimentary 15-min skype voice call.
Licensing
This presentation is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4

More Related Content

What's hot

Lawson Microsoft Addins
Lawson Microsoft AddinsLawson Microsoft Addins
Lawson Microsoft Addins
Nogalis Inc
 
An Introduction to Pentaho Kettle
An Introduction to Pentaho KettleAn Introduction to Pentaho Kettle
An Introduction to Pentaho Kettle
Dan Moore
 
Express Yourself: Building Expressions with Microsoft Flow - SPSCLT 2018
Express Yourself: Building Expressions with Microsoft Flow - SPSCLT 2018Express Yourself: Building Expressions with Microsoft Flow - SPSCLT 2018
Express Yourself: Building Expressions with Microsoft Flow - SPSCLT 2018
Fausto Capellan Jr
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxMichael Hackstein
 
TaLend Online Training
TaLend Online TrainingTaLend Online Training
TaLend Online Training
Glory IT Technologies
 
Understanding LINQ in C#
Understanding LINQ in C# Understanding LINQ in C#
Understanding LINQ in C#
MD. Shohag Mia
 
Data transformations. Using kettle transformations - Andriy Kyrylenko,
Data transformations. Using kettle transformations - Andriy Kyrylenko,Data transformations. Using kettle transformations - Andriy Kyrylenko,
Data transformations. Using kettle transformations - Andriy Kyrylenko,
Ruby Meditation
 
Spreadsheet ml subject workbook connections
Spreadsheet ml subject   workbook connectionsSpreadsheet ml subject   workbook connections
Spreadsheet ml subject workbook connections
Shawn Villaron
 
Oracle.xml.publisher
Oracle.xml.publisherOracle.xml.publisher
Oracle.xml.publisher
yu kwong yiu wilson
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine Training
Liz Grumbach
 
Data provenance in Hopsworks
Data provenance in HopsworksData provenance in Hopsworks
Data provenance in Hopsworks
Alexandru Adrian Ormenisan
 
Linq in C# 3.0: An Overview
Linq in C# 3.0: An OverviewLinq in C# 3.0: An Overview
Linq in C# 3.0: An Overview
pradeepkothiyal
 
PharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus DenkerPharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus Denker
Pharo
 
Lawson ProcessFlow
Lawson ProcessFlowLawson ProcessFlow
Lawson ProcessFlow
Nogalis Inc
 
MarcEdit for Everyone with Katie Dunn
MarcEdit for Everyone with Katie DunnMarcEdit for Everyone with Katie Dunn
MarcEdit for Everyone with Katie Dunn
WiLS
 
HUG France - Paris - Data Engineer's Toolkit
HUG France - Paris - Data Engineer's ToolkitHUG France - Paris - Data Engineer's Toolkit
HUG France - Paris - Data Engineer's Toolkit
Synaltic Group
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz Inc
Franz Inc. - AllegroGraph
 
Presto Features
Presto FeaturesPresto Features
Presto Features
ctrlib
 
jQuery templates and data link
jQuery templates and data linkjQuery templates and data link

What's hot (20)

Lawson Microsoft Addins
Lawson Microsoft AddinsLawson Microsoft Addins
Lawson Microsoft Addins
 
An Introduction to Pentaho Kettle
An Introduction to Pentaho KettleAn Introduction to Pentaho Kettle
An Introduction to Pentaho Kettle
 
Express Yourself: Building Expressions with Microsoft Flow - SPSCLT 2018
Express Yourself: Building Expressions with Microsoft Flow - SPSCLT 2018Express Yourself: Building Expressions with Microsoft Flow - SPSCLT 2018
Express Yourself: Building Expressions with Microsoft Flow - SPSCLT 2018
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB Foxx
 
TaLend Online Training
TaLend Online TrainingTaLend Online Training
TaLend Online Training
 
Understanding LINQ in C#
Understanding LINQ in C# Understanding LINQ in C#
Understanding LINQ in C#
 
Data transformations. Using kettle transformations - Andriy Kyrylenko,
Data transformations. Using kettle transformations - Andriy Kyrylenko,Data transformations. Using kettle transformations - Andriy Kyrylenko,
Data transformations. Using kettle transformations - Andriy Kyrylenko,
 
Spreadsheet ml subject workbook connections
Spreadsheet ml subject   workbook connectionsSpreadsheet ml subject   workbook connections
Spreadsheet ml subject workbook connections
 
Oracle.xml.publisher
Oracle.xml.publisherOracle.xml.publisher
Oracle.xml.publisher
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine Training
 
Data provenance in Hopsworks
Data provenance in HopsworksData provenance in Hopsworks
Data provenance in Hopsworks
 
Linq in C# 3.0: An Overview
Linq in C# 3.0: An OverviewLinq in C# 3.0: An Overview
Linq in C# 3.0: An Overview
 
PharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus DenkerPharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus Denker
 
Lawson ProcessFlow
Lawson ProcessFlowLawson ProcessFlow
Lawson ProcessFlow
 
MarcEdit for Everyone with Katie Dunn
MarcEdit for Everyone with Katie DunnMarcEdit for Everyone with Katie Dunn
MarcEdit for Everyone with Katie Dunn
 
HUG France - Paris - Data Engineer's Toolkit
HUG France - Paris - Data Engineer's ToolkitHUG France - Paris - Data Engineer's Toolkit
HUG France - Paris - Data Engineer's Toolkit
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz Inc
 
Presto Features
Presto FeaturesPresto Features
Presto Features
 
Ajax
AjaxAjax
Ajax
 
jQuery templates and data link
jQuery templates and data linkjQuery templates and data link
jQuery templates and data link
 

Similar to Introduction to ETL

Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow Management
Romi Kuntsman
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
Amazon Web Services
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
Ankita Dubey
 
WEBINAR: Proven Patterns for Loading Test Data for Managed Package Testing
WEBINAR: Proven Patterns for Loading Test Data for Managed Package TestingWEBINAR: Proven Patterns for Loading Test Data for Managed Package Testing
WEBINAR: Proven Patterns for Loading Test Data for Managed Package Testing
CodeScience
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
Deepali Raut
 
Ldap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLLdap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLsbahloul
 
ODTUG KSCOPE 2018 - REST APIs for FDMEE and Cloud Data Management
ODTUG KSCOPE 2018 - REST APIs for FDMEE and Cloud Data ManagementODTUG KSCOPE 2018 - REST APIs for FDMEE and Cloud Data Management
ODTUG KSCOPE 2018 - REST APIs for FDMEE and Cloud Data Management
Francisco Amores
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Databricks
 
Salesforce connect
Salesforce connectSalesforce connect
Salesforce connect
Lalit Chandnani
 
Flow like a Rockstar @ SharePoint Saturday The Netherlands
Flow like a Rockstar @ SharePoint Saturday The NetherlandsFlow like a Rockstar @ SharePoint Saturday The Netherlands
Flow like a Rockstar @ SharePoint Saturday The Netherlands
Daniel Laskewitz
 
Pentaho ppt up
Pentaho ppt upPentaho ppt up
Pentaho ppt up
03446940736
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Databricks
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
Julien Le Dem
 
Warehousing Your Hits - The Why and How of Owning Your Data
Warehousing Your Hits - The Why and How of Owning Your DataWarehousing Your Hits - The Why and How of Owning Your Data
Warehousing Your Hits - The Why and How of Owning Your Data
Scott Arbeitman
 
From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting data
markgrover
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
Philip Yurchuk
 
Content migration for sitecore
Content migration for sitecoreContent migration for sitecore
Content migration for sitecore
Surendra Sharma
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
JesusaEspeleta
 
Xml Publisher And Reporting To Excel
Xml Publisher And Reporting To ExcelXml Publisher And Reporting To Excel
Xml Publisher And Reporting To Excel
Duncan Davies
 

Similar to Introduction to ETL (20)

Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow Management
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
WEBINAR: Proven Patterns for Loading Test Data for Managed Package Testing
WEBINAR: Proven Patterns for Loading Test Data for Managed Package TestingWEBINAR: Proven Patterns for Loading Test Data for Managed Package Testing
WEBINAR: Proven Patterns for Loading Test Data for Managed Package Testing
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Ldap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLLdap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLL
 
ODTUG KSCOPE 2018 - REST APIs for FDMEE and Cloud Data Management
ODTUG KSCOPE 2018 - REST APIs for FDMEE and Cloud Data ManagementODTUG KSCOPE 2018 - REST APIs for FDMEE and Cloud Data Management
ODTUG KSCOPE 2018 - REST APIs for FDMEE and Cloud Data Management
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Salesforce connect
Salesforce connectSalesforce connect
Salesforce connect
 
Flow like a Rockstar @ SharePoint Saturday The Netherlands
Flow like a Rockstar @ SharePoint Saturday The NetherlandsFlow like a Rockstar @ SharePoint Saturday The Netherlands
Flow like a Rockstar @ SharePoint Saturday The Netherlands
 
Pentaho ppt up
Pentaho ppt upPentaho ppt up
Pentaho ppt up
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
ETL
ETL ETL
ETL
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Warehousing Your Hits - The Why and How of Owning Your Data
Warehousing Your Hits - The Why and How of Owning Your DataWarehousing Your Hits - The Why and How of Owning Your Data
Warehousing Your Hits - The Why and How of Owning Your Data
 
From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting data
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
Content migration for sitecore
Content migration for sitecoreContent migration for sitecore
Content migration for sitecore
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
Xml Publisher And Reporting To Excel
Xml Publisher And Reporting To ExcelXml Publisher And Reporting To Excel
Xml Publisher And Reporting To Excel
 

More from Maira Bay de Souza

Software Quality for Developers
Software Quality for DevelopersSoftware Quality for Developers
Software Quality for Developers
Maira Bay de Souza
 
Using LinkedIn to find your next job
Using LinkedIn to find your next jobUsing LinkedIn to find your next job
Using LinkedIn to find your next jobMaira Bay de Souza
 
Job hunting tips from an HR perspective
Job hunting tips from an HR perspectiveJob hunting tips from an HR perspective
Job hunting tips from an HR perspectiveMaira Bay de Souza
 
3 Secrets of a Top Linked-In Networker
3 Secrets of a Top Linked-In Networker3 Secrets of a Top Linked-In Networker
3 Secrets of a Top Linked-In Networker
Maira Bay de Souza
 
What is an RTM?
What is an RTM?What is an RTM?
What is an RTM?
Maira Bay de Souza
 
Using LinkedIn to Help Find Your Next Job
Using LinkedIn to Help Find Your Next JobUsing LinkedIn to Help Find Your Next Job
Using LinkedIn to Help Find Your Next Job
Maira Bay de Souza
 
Success in Toronto for Newcomers
Success in Toronto for NewcomersSuccess in Toronto for Newcomers
Success in Toronto for Newcomers
Maira Bay de Souza
 
TesTrek Notes
TesTrek NotesTesTrek Notes
TesTrek Notes
Maira Bay de Souza
 

More from Maira Bay de Souza (11)

Software Quality for Developers
Software Quality for DevelopersSoftware Quality for Developers
Software Quality for Developers
 
Mobile Apps Testing - Part 2
Mobile Apps Testing - Part 2Mobile Apps Testing - Part 2
Mobile Apps Testing - Part 2
 
Mobile Apps Testing - Part1
Mobile Apps Testing - Part1Mobile Apps Testing - Part1
Mobile Apps Testing - Part1
 
Using LinkedIn to find your next job
Using LinkedIn to find your next jobUsing LinkedIn to find your next job
Using LinkedIn to find your next job
 
Job hunting tips from an HR perspective
Job hunting tips from an HR perspectiveJob hunting tips from an HR perspective
Job hunting tips from an HR perspective
 
4 steps to networking success
4 steps to networking success4 steps to networking success
4 steps to networking success
 
3 Secrets of a Top Linked-In Networker
3 Secrets of a Top Linked-In Networker3 Secrets of a Top Linked-In Networker
3 Secrets of a Top Linked-In Networker
 
What is an RTM?
What is an RTM?What is an RTM?
What is an RTM?
 
Using LinkedIn to Help Find Your Next Job
Using LinkedIn to Help Find Your Next JobUsing LinkedIn to Help Find Your Next Job
Using LinkedIn to Help Find Your Next Job
 
Success in Toronto for Newcomers
Success in Toronto for NewcomersSuccess in Toronto for Newcomers
Success in Toronto for Newcomers
 
TesTrek Notes
TesTrek NotesTesTrek Notes
TesTrek Notes
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 

Introduction to ETL

  • 2. About the Author Maira Bay de Souza, BSc. Comp. Science Working with ● software testing ● software development since 2001 Talend ETL developer and tester since 2013 IBM, HP, SunLife, small businesses
  • 3. What is ETL? ● Extract, Transform, Load ● Sequence of operations on the same dataset ● Sometimes joining datasets together in T ● Simple Transformations may be done in E, L
  • 4. Extract Read any kind of static data source: ● Extract data from a website (HTML, JSON, RSS, etc) ● Read files from a server (FTP, SCP, etc) ● Query a RESTful API ● Read from a database ● Read from a cloud storage unit: GoogleDrive, GoogleStorage, AWS, DropBox, etc ● Read data from common business applications: SAP, SalesForce, SugarCRM, etc
  • 5. Transform Make operations on data as a whole: ● Split names into first, middle, last ● Filter out people with blank addresses ● Sort employees by % of sales target achived ● Join data from an excel file and a database ● Find duplicate names using Levenstein ● Normalize or denormalize list of addresses ● Split postal code based on Regex ● Validate XML with XSD
  • 6. Load Output data in any kind of format: ● Save a CSV, XML, etc ● Insert or Update a table in a database ● Send a file in an email ● Make a JSON available through a RESTful API ● Save data on a cloud storage unit:GoogleDrive, GoogleStorage, AWS, DropBox, etc ● Save data on common business applications: SAP, SalesForce, SugarCRM, etc
  • 7. Example applications ● Find twitter followers who are not facebook followers and make their names and logins available on a JSON via RESTful API ● Join employee names from HR database with sales records from CRM and send weekly email to CMO with names and progress towards sales target
  • 8. Difference between ETL and WebApp WebApp ● Reads one or more user inputs or actions: – forms filled – button clicked – etc ● Produces a result: – page updated – page loaded – etc ETL ● Reads one or more data inputs: – table from database – pages from RSS feed – etc ● Produces another data output or action: – send email – create Jasper Report – etc
  • 9. Tools Tools: 1)Talend Open Studio 2)Pentaho Spoon/PDI (previously Kettle) Features: 1)Free 2)Easy-to-use 3)Powerful
  • 10. Demo Live Demo: creating a Talend ETL job
  • 11. Questions/Requests Questions??? I'd be glad to answer any of your ETL questions and requests. Click here to schedule a complimentary 15-min skype voice call.
  • 12. Licensing This presentation is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4