2. Data Integration
Data integration involves combining data residing in different
sources and providing users with a unified view of these data.
Use Cases:
•Data Migration: One time collection data from different sources.
•Data Collection/Operational Integration[ETL/ELT]: Collecting
data from different sources on regular interval.
•EAI/CDC: Bidirectional synchronization between data sources on
regular interval or on occurrence of events.
TOS-DI focuses on first two.
3. Design Philosophy-DI Tools
Earlier application were developed from scratch for integration. Based on
characteristics visual tools were developed.
Every data source has two characteristics, a transport protocol to carry data to
and fro, and structure of data.
Ex. JDBC is transport protocol and table schema is data format.
Transport protocol (need credentials to connect) can abstracted away elegantly
Data format (flat row/columns in CSV or RDBMS, JSON or XML) can be
expressed visually
Both of them are decoupled and can reuse defined schema for different
transports. Ex. While coping data from RDBMS to CSV file can define schema
once with data types etc and use this schema with both transport.
Standard transformation like date/currency formatting, transposing, pivoting,
de-duplication etc can be expressed visually on data formats.
4. Introduction TOS-DI
•A full featured free tool for Data Integration
•Eclipse based graphical modeling
•Generates Java Code transparent to developers
(Java Emitter Template based).
•Can be deployed as standalone jar, web service
OSGi etc.
•Enterprise support available.
5. Why Talend?
ETL/DI features:
•Metadata repository for managing schema of
all data sources and context management
•Connectors for all major data sources (files,
DBMS, HTTP endpoints etc.) and data formats.
•Open source as well as enterprise SLA support
•Supports deployment of ETL as standalone jar,
web service, OSGi bundle and also have
administrative console to deploy.
6. Why Talend?
Jobs can be included in a web application (as
dependent jar) or any application can trigger job, like
Documentation and process modelling features.
7. Why Talend?
Apart from these data integration features, it’s
easy to learn and work with for Java developers:
• Eclipse based and generates Java code from
the drag/drop GUI.
•Allows Java code in the flow to customize or
create separate Java routine to reuse across ETL
jobs.
•Can use any Java library in ETL, so can parse any
file format, communicate on any transport
protocol (if not inbuilt supported) .
8. Why Talend?
Do whatever possible in Java
Use Eclipse JET (Java Emitter Templates) to generate
UI guided and eclipse native Java debugging support
9. Talend IDE Overview
Repository
•Business Model-Graphically represents business
process (used by business people to convey
process to developers).
•Job Design-Create/edit jobs. Author jobs in
"Design" window by drag-drop components
from pallate and connecting them.
•Context-Global context to share across jobs.
The values for these variable will be same for all
jobs.
10. Talend IDE Overview [Contd..]
Repository
Code-Java classes (with static methods) to reuse code
SQL Template-Templates for common SQL tasks
Metadata-Shared connections (database, FTP etc) an
11. Talend IDE Overview
Pallate
•100s of components (searchable) categorized
into family. Component is a code template with
a form to provide parameters. Fits into job flow
and provide/process input/output depend on
the purpose. For example, tMySqlInput provides
data to job flow, tMysqlOutput writes job flow
data to database and tMap process data inputed
in job flow and provide processed data to job
flow.
13. Talend IDE Overview
Views
•Run-Execute Job with selected context group.
•Component-Configure component (selected in Jon
design) parameters.
•Contexts-Create Job specific context variables and
groups. Provide values to context variables.
•Problems-See any error or warning in Job.
•Code Viewer-See generated code for selected
component in Jon design.
•Outline-See list of components used in the Job.
15. Create Job in TOS-DI
Business Analyst create a business model and
shared with developers
Create meta-data for data sources.
Create routines to be used in Job (if any, check if
it already built-in)
Design Job
Create context variables
Configure components
Run Job
Export and deploy Job
16. Integration Plan
Extract data from excel file.
Transform and,
Load to MySQL for analytics
Load to Elasticsearch for full text search
Send success/failure email.