Talend Data Integration and Management
Data Integration   Data Integration involves combining data residing in differente sources and providing the        user w...
Talend●   Talend is a company focused on Data    Integration and Data Management solutions●   Talend is a „Cool Vendor“ fo...
Talend Open Studio                     www.robertomarchetto.com
Talend Open Studio●   Open Source, professional tool●   Draw procedures linking components, each    component performs an ...
Automatic code generation, diffent           deployment                             www.robertomarchetto.com
Extracion Transformation Loading●   ETL is a common process in Data Integration    ●   Extract, reading data from differen...
Tutorial, Source data                        www.robertomarchetto.com
Tutorial, Destination data (Datawarehouse)                                 www.robertomarchetto.com
Tutorial, Metadata●   Talend requires a preliminary definition of the    metadata●   Often a strong metadata definition me...
Tutorial, Talend jobs basics●   Place components on the designer●   Link components to build a transformation●   Main type...
Tutorial, users_dimension                        www.robertomarchetto.com
Test the job               www.robertomarchetto.com
Tutorial, accounts_dimension                         www.robertomarchetto.com
Tutorial, dates_dimension                        www.robertomarchetto.com
Tutorial, write a Java library                            www.robertomarchetto.com
Tutorial, opportunities_fact                          www.robertomarchetto.com
Tutorial, define a root job                          www.robertomarchetto.com
Deploy and run                 www.robertomarchetto.com
Extensibility, comunity plugins                ●   Many official                    components                ●   Componen...
Scheduler            www.robertomarchetto.com
And now.. reports, dashboards, OLAP,        Geoanalysis, KPIs..                              www.robertomarchetto.com
Do you trust your data?                     www.robertomarchetto.com
What about data quality?●   Customer A is present 5 times with different    names●   Null values can vary statistical inde...
Talend Open Profiler                       www.robertomarchetto.com
What abount data storage size?●   Some fields can be oversized for the data they    contain●   Sometimes fields are relate...
Data is „the black gold“ that can produce                knowledge●   Data is a resource, you can extract knowledge●   A l...
Talend Master Data Management                         www.robertomarchetto.com
Upcoming SlideShare
Loading in...5
×

Talend Open Studio Data Integration

6,218

Published on

Talend Open Studio ETL tool, Talend Profiler and Data Management. Tot

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,218
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
284
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Talend Open Studio Data Integration

  1. 1. Talend Data Integration and Management
  2. 2. Data Integration Data Integration involves combining data residing in differente sources and providing the user with a unified view of the dataData Management combines different disciplines to manage data as a valuable resource www.robertomarchetto.com
  3. 3. Talend● Talend is a company focused on Data Integration and Data Management solutions● Talend is a „Cool Vendor“ for Gartner (2010)● Present in more than 12 locations around the World● Fast growing company www.robertomarchetto.com
  4. 4. Talend Open Studio www.robertomarchetto.com
  5. 5. Talend Open Studio● Open Source, professional tool● Draw procedures linking components, each component performs an operation● DB vendor-specific optimized components● Produces fully editable Java (or Perl) code● Deployment with small and fast compiled Java or as Web Service● Eclipse based IDE, excellent flexibility● BI Platform indipendent, DB Vendor indipendent www.robertomarchetto.com
  6. 6. Automatic code generation, diffent deployment www.robertomarchetto.com
  7. 7. Extracion Transformation Loading● ETL is a common process in Data Integration ● Extract, reading data from different datasources (database, flat files, spreadsheet files, web services, etc) ● Transfom, converting data in a form so that it can be placed in another container (database, web services, files, etc). Cleaning, computations and verifications are also performed ● Load, write the data in the target format www.robertomarchetto.com
  8. 8. Tutorial, Source data www.robertomarchetto.com
  9. 9. Tutorial, Destination data (Datawarehouse) www.robertomarchetto.com
  10. 10. Tutorial, Metadata● Talend requires a preliminary definition of the metadata● Often a strong metadata definition means, as in programming languages, fast, robust and maintenable applications● ..demo.. www.robertomarchetto.com
  11. 11. Tutorial, Talend jobs basics● Place components on the designer● Link components to build a transformation● Main type of link: Rows flow● Schema metadata is propagated and must be coherent● ..demo.. www.robertomarchetto.com
  12. 12. Tutorial, users_dimension www.robertomarchetto.com
  13. 13. Test the job www.robertomarchetto.com
  14. 14. Tutorial, accounts_dimension www.robertomarchetto.com
  15. 15. Tutorial, dates_dimension www.robertomarchetto.com
  16. 16. Tutorial, write a Java library www.robertomarchetto.com
  17. 17. Tutorial, opportunities_fact www.robertomarchetto.com
  18. 18. Tutorial, define a root job www.robertomarchetto.com
  19. 19. Deploy and run www.robertomarchetto.com
  20. 20. Extensibility, comunity plugins ● Many official components ● Components for every task released by the comunity ● Geospatial components, log analysis, Google analytics, data encryption, etc www.robertomarchetto.com
  21. 21. Scheduler www.robertomarchetto.com
  22. 22. And now.. reports, dashboards, OLAP, Geoanalysis, KPIs.. www.robertomarchetto.com
  23. 23. Do you trust your data? www.robertomarchetto.com
  24. 24. What about data quality?● Customer A is present 5 times with different names● Null values can vary statistical indexes like mean calculation● Duplicated records● Blank values● Some records can contain errors (es -1 field values)● Some records can be garbage www.robertomarchetto.com
  25. 25. Talend Open Profiler www.robertomarchetto.com
  26. 26. What abount data storage size?● Some fields can be oversized for the data they contain● Sometimes fields are related and can be calculated● Some keys or values are never used● When data grow garbage grow● Data storage is not free (disks, electricity, backups, DB licenses) www.robertomarchetto.com
  27. 27. Data is „the black gold“ that can produce knowledge● Data is a resource, you can extract knowledge● A lot of Data produces concise informations● Data storage is not free and a lot of data can make system not fast● Data cleansing is a central process in statistical analysis and Data Mining www.robertomarchetto.com
  28. 28. Talend Master Data Management www.robertomarchetto.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×