Etl with talend (data integeration)

893 views
689 views

Published on

ETL operations using talend open studio for data integration

Published in: Software, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
893
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
60
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Etl with talend (data integeration)

  1. 1. ETL with talend POOJA B. MISHRA
  2. 2. What is ETL? Extract is the process of reading data from a database Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data Load is the process of writing the data into the target database
  3. 3. Process flow
  4. 4. Terms closely related and managed by ETL processes  data migration data management  data cleansing data synchronization  data consolidation. .
  5. 5. Different ETL tools •Oracle ETL •Ab Initio •Pentaho Data Integration -Kettle Project (open source ETL) •SAS ETL studio •Cognos Decisionstream •Business Objects Data Integrator (BODI) •Microsoft SQL Server Integration Services (SSIS) •Informatica PowerCenter •Talend
  6. 6. Prerequisites Talend Open Studio for Data Integration ◦ http://www.talend.com/download VirtualBox ◦ https://www.virtualbox.org/wiki/Downloads Hortonworks Sandbox VM ◦ http://hortonworks.com/products/hortonworks- sandbox/#install
  7. 7. How to set up – Step 1
  8. 8. Step 2
  9. 9. Step 3
  10. 10. Step 4
  11. 11. Step 5
  12. 12. Talend Interface Workspace Repository tree Component configuration Palette WorkspaceRepository tree Palette Repository tree Workspace Palette Component configuration
  13. 13. What are supported data input formats? •SQL •MySQL •PostgreSQL •Sybase •Teradata •MSSQL •Netezza •Greenplum •Access •DB2 •Hive
  14. 14. What kinds of datasets can be loaded? Talend Studio offers nearly comprehensive connectivity to: Packaged applications (ERP, CRM, etc.), databases, mainframes, files, Web Services, and so on to address the growing disparity of sources. Data warehouses, data marts, OLAP applications - for analysis, reporting, dashboarding, scorecarding, and so on. Built-in advanced components for ETL, including string manipulations, Slowly Changing Dimensions, automatic lookup handling, bulk loads support, and so on.
  15. 15. What are data export formats?
  16. 16. Talend - ELT DB Components – Step 1
  17. 17. Step 2
  18. 18. Step 3
  19. 19. Step 4
  20. 20. Step 5
  21. 21. Step 6
  22. 22. Challenges involved Data volumes are growing exponentially Data velocity is moving faster As information systems grow in complexity, the disparity of sources is growing as well All these target structures have different data transformation requirements and different tolerances in terms of latency Transformations involved in ETL processes can be highly complex
  23. 23. Thank You!!

×