Your SlideShare is downloading. ×
0
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Talend Open Studio Data Integration
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Talend Open Studio Data Integration

5,752

Published on

Talend Open Studio ETL tool, Talend Profiler and Data Management. Tot

Talend Open Studio ETL tool, Talend Profiler and Data Management. Tot

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,752
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
260
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Talend Data Integration and Management
  • 2. Data Integration Data Integration involves combining data residing in differente sources and providing the user with a unified view of the dataData Management combines different disciplines to manage data as a valuable resource www.robertomarchetto.com
  • 3. Talend● Talend is a company focused on Data Integration and Data Management solutions● Talend is a „Cool Vendor“ for Gartner (2010)● Present in more than 12 locations around the World● Fast growing company www.robertomarchetto.com
  • 4. Talend Open Studio www.robertomarchetto.com
  • 5. Talend Open Studio● Open Source, professional tool● Draw procedures linking components, each component performs an operation● DB vendor-specific optimized components● Produces fully editable Java (or Perl) code● Deployment with small and fast compiled Java or as Web Service● Eclipse based IDE, excellent flexibility● BI Platform indipendent, DB Vendor indipendent www.robertomarchetto.com
  • 6. Automatic code generation, diffent deployment www.robertomarchetto.com
  • 7. Extracion Transformation Loading● ETL is a common process in Data Integration ● Extract, reading data from different datasources (database, flat files, spreadsheet files, web services, etc) ● Transfom, converting data in a form so that it can be placed in another container (database, web services, files, etc). Cleaning, computations and verifications are also performed ● Load, write the data in the target format www.robertomarchetto.com
  • 8. Tutorial, Source data www.robertomarchetto.com
  • 9. Tutorial, Destination data (Datawarehouse) www.robertomarchetto.com
  • 10. Tutorial, Metadata● Talend requires a preliminary definition of the metadata● Often a strong metadata definition means, as in programming languages, fast, robust and maintenable applications● ..demo.. www.robertomarchetto.com
  • 11. Tutorial, Talend jobs basics● Place components on the designer● Link components to build a transformation● Main type of link: Rows flow● Schema metadata is propagated and must be coherent● ..demo.. www.robertomarchetto.com
  • 12. Tutorial, users_dimension www.robertomarchetto.com
  • 13. Test the job www.robertomarchetto.com
  • 14. Tutorial, accounts_dimension www.robertomarchetto.com
  • 15. Tutorial, dates_dimension www.robertomarchetto.com
  • 16. Tutorial, write a Java library www.robertomarchetto.com
  • 17. Tutorial, opportunities_fact www.robertomarchetto.com
  • 18. Tutorial, define a root job www.robertomarchetto.com
  • 19. Deploy and run www.robertomarchetto.com
  • 20. Extensibility, comunity plugins ● Many official components ● Components for every task released by the comunity ● Geospatial components, log analysis, Google analytics, data encryption, etc www.robertomarchetto.com
  • 21. Scheduler www.robertomarchetto.com
  • 22. And now.. reports, dashboards, OLAP, Geoanalysis, KPIs.. www.robertomarchetto.com
  • 23. Do you trust your data? www.robertomarchetto.com
  • 24. What about data quality?● Customer A is present 5 times with different names● Null values can vary statistical indexes like mean calculation● Duplicated records● Blank values● Some records can contain errors (es -1 field values)● Some records can be garbage www.robertomarchetto.com
  • 25. Talend Open Profiler www.robertomarchetto.com
  • 26. What abount data storage size?● Some fields can be oversized for the data they contain● Sometimes fields are related and can be calculated● Some keys or values are never used● When data grow garbage grow● Data storage is not free (disks, electricity, backups, DB licenses) www.robertomarchetto.com
  • 27. Data is „the black gold“ that can produce knowledge● Data is a resource, you can extract knowledge● A lot of Data produces concise informations● Data storage is not free and a lot of data can make system not fast● Data cleansing is a central process in statistical analysis and Data Mining www.robertomarchetto.com
  • 28. Talend Master Data Management www.robertomarchetto.com

×