Open Source ETL using Talend Open Studio

                                    Lu´ Santos
                                      ıs
                                luis@luissantos.pt



                                 February 14, 2013




Lu´ Santos luis@luissantos.pt
  ıs                                Open Source ETL   February 14, 2013   1
Overview

1    Who am i?

2    What is ETL?

3    ETL Software Suites

4    Talend Open Studio for Data Integration

5    Hands on

6    Conclusion



    Lu´ Santos luis@luissantos.pt
      ıs                            Open Source ETL   February 14, 2013   2
Warning!!!




This presentation was created using Latex
                  Why?
             Because i can!




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   3
Who am i?




Lu´ Santos luis@luissantos.pt
  ıs                              Open Source ETL   February 14, 2013   4
Who am i?




          Software Engineer and
          Mathematics Student
          Open Source addicted
          PHP and Java Developer




 Lu´ Santos luis@luissantos.pt
   ıs                             Open Source ETL   February 14, 2013   5
What is ETL?




Lu´ Santos luis@luissantos.pt
  ıs                               Open Source ETL   February 14, 2013   6
What is ETL?


     In computing, Extract, Transform and Load (ETL) refers to a
     process in database usage and especially in data warehousing
     that involves:
             Extracting data from outside sources
             Transforming it to fit operational needs (which can include
             quality levels)
             Loading it into the end target (database, more specifically,
             operational data store, data mart or data warehouse)



        (2013, http://en.wikipedia.org/wiki/Extract, transform, load)




 Lu´ Santos luis@luissantos.pt
   ıs                              Open Source ETL               February 14, 2013   7
ETL Software Suites




      Pentaho Data Integration (Kettle)
      SQL Server Integration Services
      Talend Open Studio for Data Integration
      etc...




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   8
Talend Open Studio for Data Integration


Talend Open Studio is a set of tools for developing, testing, deploying and
application integration projects.
      Talend Open Studio for Big Data
      Bonita Open Solution (BPM)
      Talend Open Studio for Data Integration
      Talend Open Studio for Data Quality
      Talend ESB
      Talend Open Studio for MDM




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL             February 14, 2013   9
Datasource(rer)s




Lu´ Santos luis@luissantos.pt
  ıs                                 Open Source ETL   February 14, 2013   10
Datasources (Extract and Load)




  Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP,
                  REST, HTTP, FTP, SSH, Imap




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL     February 14, 2013   11
Transformers




Lu´ Santos luis@luissantos.pt
  ıs                               Open Source ETL   February 14, 2013   12
Transformers (Transform)




      Sort data
      Convert data
      Cross data between datasources
      Filter data
      Fuzzy search
      Normalize and Denormalize data




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   13
Where and how ?



     Where ?
             Multi-platform ( Linux, MacOs, BSD-* even on windows )
             You just need a JVM (Java Virtual Machine)




 Lu´ Santos luis@luissantos.pt
   ıs                              Open Source ETL              February 14, 2013   14
Where and how ?



     Where ?
             Multi-platform ( Linux, MacOs, BSD-* even on windows )
             You just need a JVM (Java Virtual Machine)
     How ?
             Execute it from your favorite programming language using syscalls
             Command line
             From your JVM based application (Java, Groovy, JRuby)
             Webservices runing on the top Java App Server (Tomcat, Glassfish)




 Lu´ Santos luis@luissantos.pt
   ıs                               Open Source ETL               February 14, 2013   14
Hands on




Lu´ Santos luis@luissantos.pt
  ıs                             Open Source ETL   February 14, 2013   15
Hands on




     Querying data
     Joining data from multiple datasources
     Filtering and sorting data
     Exporting data
     Deploying your job
     Calling it from PHP




 Lu´ Santos luis@luissantos.pt
   ıs                             Open Source ETL   February 14, 2013   16
Database Schema




 Lu´ Santos luis@luissantos.pt
   ıs                            Open Source ETL   February 14, 2013   17
Example




 Lu´ Santos luis@luissantos.pt
   ıs                            Open Source ETL   February 14, 2013   18
”With great power comes great responsability.”
                                         (Voltair)




Lu´ Santos luis@luissantos.pt
  ıs                            Open Source ETL      February 14, 2013   19
The End
    email: luis@luissantos.pt
    twitter: @santosluis87
    linkedin: https://www.linkedin.com/in/luissantos87




Lu´ Santos luis@luissantos.pt
  ıs                             Open Source ETL         February 14, 2013   20

Open Source ETL using Talend Open Studio

  • 1.
    Open Source ETLusing Talend Open Studio Lu´ Santos ıs luis@luissantos.pt February 14, 2013 Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 1
  • 2.
    Overview 1 Who am i? 2 What is ETL? 3 ETL Software Suites 4 Talend Open Studio for Data Integration 5 Hands on 6 Conclusion Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 2
  • 3.
    Warning!!! This presentation wascreated using Latex Why? Because i can! Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 3
  • 4.
    Who am i? Lu´Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 4
  • 5.
    Who am i? Software Engineer and Mathematics Student Open Source addicted PHP and Java Developer Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 5
  • 6.
    What is ETL? Lu´Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 6
  • 7.
    What is ETL? In computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that involves: Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse) (2013, http://en.wikipedia.org/wiki/Extract, transform, load) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 7
  • 8.
    ETL Software Suites Pentaho Data Integration (Kettle) SQL Server Integration Services Talend Open Studio for Data Integration etc... Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 8
  • 9.
    Talend Open Studiofor Data Integration Talend Open Studio is a set of tools for developing, testing, deploying and application integration projects. Talend Open Studio for Big Data Bonita Open Solution (BPM) Talend Open Studio for Data Integration Talend Open Studio for Data Quality Talend ESB Talend Open Studio for MDM Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 9
  • 10.
    Datasource(rer)s Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 10
  • 11.
    Datasources (Extract andLoad) Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP, REST, HTTP, FTP, SSH, Imap Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 11
  • 12.
    Transformers Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 12
  • 13.
    Transformers (Transform) Sort data Convert data Cross data between datasources Filter data Fuzzy search Normalize and Denormalize data Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 13
  • 14.
    Where and how? Where ? Multi-platform ( Linux, MacOs, BSD-* even on windows ) You just need a JVM (Java Virtual Machine) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 14
  • 15.
    Where and how? Where ? Multi-platform ( Linux, MacOs, BSD-* even on windows ) You just need a JVM (Java Virtual Machine) How ? Execute it from your favorite programming language using syscalls Command line From your JVM based application (Java, Groovy, JRuby) Webservices runing on the top Java App Server (Tomcat, Glassfish) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 14
  • 16.
    Hands on Lu´ Santosluis@luissantos.pt ıs Open Source ETL February 14, 2013 15
  • 17.
    Hands on Querying data Joining data from multiple datasources Filtering and sorting data Exporting data Deploying your job Calling it from PHP Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 16
  • 18.
    Database Schema Lu´Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 17
  • 19.
    Example Lu´ Santosluis@luissantos.pt ıs Open Source ETL February 14, 2013 18
  • 20.
    ”With great powercomes great responsability.” (Voltair) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 19
  • 21.
    The End email: luis@luissantos.pt twitter: @santosluis87 linkedin: https://www.linkedin.com/in/luissantos87 Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 20