Introduction to ETL

Introduction to ETL
by
Maira Bay de Souza

About the Author
Maira Bay de Souza, BSc. Comp.
Science
Working with
● software testing
● software development
since 2001
Talend ETL developer and tester since
2013
IBM, HP, SunLife, small businesses

What is ETL?
● Extract, Transform, Load
● Sequence of operations on the same dataset
● Sometimes joining datasets together in T
● Simple Transformations may be done in E, L

Extract
Read any kind of static data source:
● Extract data from a website (HTML, JSON,
RSS, etc)
● Read files from a server (FTP, SCP, etc)
● Query a RESTful API
● Read from a database
● Read from a cloud storage unit: GoogleDrive,
GoogleStorage, AWS, DropBox, etc
● Read data from common business applications:
SAP, SalesForce, SugarCRM, etc

Transform
Make operations on data as a whole:
● Split names into first, middle, last
● Filter out people with blank addresses
● Sort employees by % of sales target achived
● Join data from an excel file and a database
● Find duplicate names using Levenstein
● Normalize or denormalize list of addresses
● Split postal code based on Regex
● Validate XML with XSD

Load
Output data in any kind of format:
● Save a CSV, XML, etc
● Insert or Update a table in a database
● Send a file in an email
● Make a JSON available through a RESTful API
● Save data on a cloud storage unit:GoogleDrive,
GoogleStorage, AWS, DropBox, etc
● Save data on common business applications:
SAP, SalesForce, SugarCRM, etc

Example applications
● Find twitter followers who are not facebook
followers and make their names and logins
available on a JSON via RESTful API
● Join employee names from HR database with
sales records from CRM and send weekly
email to CMO with names and progress
towards sales target

Difference between ETL and
WebApp
WebApp
● Reads one or more user inputs
or actions:
– forms filled
– button clicked
– etc
● Produces a result:
– page updated
– page loaded
– etc
ETL
● Reads one or more data inputs:
– table from database
– pages from RSS feed
– etc
● Produces another data output or
action:
– send email
– create Jasper Report
– etc

Tools
Tools:
1)Talend Open Studio
2)Pentaho Spoon/PDI (previously Kettle)
Features:
1)Free
2)Easy-to-use
3)Powerful

Demo
Live Demo:
creating a Talend ETL job

Questions/Requests
Questions???
I'd be glad to answer any of your ETL questions and requests. Click
here to schedule a complimentary 15-min skype voice call.

Licensing
This presentation is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4

Introduction to ETL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to ETL

Similar to Introduction to ETL (20)

More from Maira Bay de Souza

More from Maira Bay de Souza (11)

Recently uploaded

Recently uploaded (20)

Introduction to ETL