Talend Open Studio Data Integration

Talend Data Integration and Management

Data Integration

Data Integration involves combining data
residing in differente sources and providing the
user with a unified view of the data

Data Management combines different disciplines
to manage data as a valuable resource

www.robertomarchetto.com

Talend

● Talend is a company focused on Data
Integration and Data Management solutions
● Talend is a „Cool Vendor“ for Gartner (2010)
● Present in more than 12 locations around the
World
● Fast growing company


Talend Open Studio


Talend Open Studio

● Open Source, professional tool
● Draw procedures linking components, each
component performs an operation
● DB vendor-specific optimized components
● Produces fully editable Java (or Perl) code
● Deployment with small and fast compiled Java
or as Web Service
● Eclipse based IDE, excellent flexibility
● BI Platform indipendent, DB Vendor indipendent

Automatic code generation, diffent
deployment


Extracion Transformation Loading

● ETL is a common process in Data Integration
● Extract, reading data from different datasources
(database, flat files, spreadsheet files, web
services, etc)
● Transfom, converting data in a form so that it can
be placed in another container (database, web
services, files, etc). Cleaning, computations and
verifications are also performed
● Load, write the data in the target format


Tutorial, Source data


Tutorial, Destination data (Datawarehouse)


Tutorial, Metadata

● Talend requires a preliminary definition of the
metadata
● Often a strong metadata definition means, as in
programming languages, fast, robust and
maintenable applications
● ..demo..


Tutorial, Talend jobs basics

● Place components on the designer
● Link components to build a transformation
● Main type of link: Rows flow
● Schema metadata is propagated and must be
coherent
● ..demo..


Tutorial, users_dimension


Test the job


Tutorial, accounts_dimension


Tutorial, dates_dimension


Tutorial, write a Java library


Tutorial, opportunities_fact


Tutorial, define a root job


Deploy and run


Extensibility, comunity plugins

● Many official
components
● Components for
every task released
by the comunity
● Geospatial
components, log
analysis, Google
analytics, data
encryption, etc


Scheduler


And now.. reports, dashboards, OLAP,
Geoanalysis, KPIs..


Do you trust your data?


What about data quality?

● Customer A is present 5 times with different
names
● Null values can vary statistical indexes like
mean calculation
● Duplicated records
● Blank values
● Some records can contain errors (es -1 field
values)
● Some records can be garbage


Talend Open Profiler


What abount data storage size?

● Some fields can be oversized for the data they
contain
● Sometimes fields are related and can be
calculated
● Some keys or values are never used
● When data grow garbage grow
● Data storage is not free (disks, electricity,
backups, DB licenses)


Data is „the black gold“ that can produce
knowledge

● Data is a resource, you can extract knowledge
● A lot of Data produces concise informations
● Data storage is not free and a lot of data can
make system not fast
● Data cleansing is a central process in statistical
analysis and Data Mining


Talend Master Data Management


Talend Open Studio Data Integration

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Talend Open Studio Data Integration

Similar to Talend Open Studio Data Integration (20)

Recently uploaded

Recently uploaded (20)

Talend Open Studio Data Integration