CloverETL Basic Training Excerpt

5,115 views

Published on

Excerpt from the CloverETL Basic Training slides.

The basic course lasts 3 days and covers basic principles, CloverETL Designer walkthrough, transaction analysis, lookups, database connections, working with structured data, XML etc.

More at www.cloveretl.com/services/training

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,115
On SlideShare
0
From Embeds
0
Number of Embeds
686
Actions
Shares
0
Downloads
131
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

CloverETL Basic Training Excerpt

  1. 1. Basic Training Course for CloverETL software Training teaser ― excerpt from Basic Training CourseAll rights reserved Javlin 2011
  2. 2. Training Course Documentation  This presentations accompanies the training course delivery  It can serve as a baseline for self-study  The course focuses on fundamentals of CloverETL platform which are needed for graph development and management  This document includes additional topics which are intended to be used as introductions to more advanced concepts and techniques  The additional topics are not a formal part of the course; they may or may not be referenced during the class time depending on factors such as time constraints and project relevance2 All rights reserved Javlin 2011
  3. 3. Training Course Objectives  On successful completion of this course you will be able to: › Develop solutions to business problems using CloverETL platform › Compose graphs using Designer and Engine components › Describe data formats with metadata definitions › Access data from multiple sources including files and databases › Detect and react to errors in data › Optimize your existing graphs › Deploy and manage graphs in CloverETL Server environment3 All rights reserved Javlin 2011
  4. 4. Agenda DAY 1  Introduction  Basic Principles  Getting Started  Designer Walkthrough  Transaction Analysis4 All rights reserved Javlin 2011
  5. 5. Agenda DAY 2  Graphs for Real World  Customer Profile Analysis  Lookups: Searching in Data5 All rights reserved Javlin 2011
  6. 6. Agenda DAY 3  Database Datasources  Working with Structured Data  XML input/output  Final Review  Test  Q&A6 All rights reserved Javlin 2011
  7. 7. B6. Task Discussion Sometimes data need to be enriched with referential information:  Who are the debtors? Steps: › Find customers identifiers who have negative personal balance › Look up details for all such customers – first and last name. How: › Use lookup tables to prepare the data for searching › Use LookupJoin component to search the table7 All rights reserved Javlin 2011
  8. 8. Lookup Tables Lookup tables are data structures that allow fast searches over data  Simple lookup is a hash table in memory  Database lookup is a database table with local cache  Range lookup allows performing range queries › “Is the value A in range <10,20> or (20,100> ?”  Persistent lookup uses index files to search data  Aspell lookup allows similarity search over strings › “Find matches for keyword ‘car’”. “Bar, card, cars”8 All rights reserved Javlin 2011
  9. 9. Lookup Table Structure Data stored in lookup tables has the following structure:  Search key › One or multiple fields  Return value › Returned when a match with key is found › Some tables allow storing duplicate keys › More than one match can be found9 All rights reserved Javlin 2011
  10. 10. Populating Lookup Tables Data for a lookup table can be provided by several means:  Manual data entry › Data are part of lookup table definition  File reference › Table definition contains URL of the input file › Metadata describe format of input file › Simple parsing  Dynamic population › Designated component for writing into lookup files › Data can be created dynamically by a graph10 All rights reserved Javlin 2011
  11. 11. Using Lookup Tables  Lookup tables are reusable and can be accessed from all reformat-like components.  Reduce the size of the lookup by reducing record width and including only applicable records in it.  Lookup table must fit into memory or the graph will fail › does not apply to database and persistent lookups  Comparable to Hash Join in performance  Offer more flexibility than joiners for partial matching11 All rights reserved Javlin 2011
  12. 12. Component LookupTableReaderWriter  The component can read or write contents of a lookup tables of any type  Use lookup table to: › Dynamically populate lookup table with data › Prepare the data for lookup when advanced parsing is needed › Dump lookup table into file or database  Found in the Others section of Component Palette  To configure the component, you need to provide: › Target lookup table12 All rights reserved Javlin 2011
  13. 13. B6. Complete Graph Section Step B6. Populate lookup table with data Key points: Use Simple lookup table type Drop unnecessary fields prior to loading into table. Split the graph into two phases, 0 and 1.13 All rights reserved Javlin 2011
  14. 14. Component LookupJoin  LookupJoin component searching a lookup table for match with records from regular data flow.  Use lookup table to: › Search any kind of lookup table for a match. › Find records that did not have any match › Comfortably handle multiple matches  Found in the Joiners section of Component Palette  To configure the component, you need to provide: › Lookup table › Joining key14 All rights reserved Javlin 2011
  15. 15. B6. Complete Graph Section Step B5. Populate lookup table with data Key points: Use ExtFilter to find customers with negative balance. Use LookupJoin to search lookup table15 All rights reserved Javlin 2011
  16. 16. B7. Task Discussion Range queries can be used to group similar records:  What level of risk do the debtors impose? Steps: › Use three risk levels: low, medium, high › Risk level is assigned based on amount of money owed How: › Use range lookup table to accommodate the range query › Use lookup(<table_name>).get() to search the table from transformation code16 All rights reserved Javlin 2011
  17. 17. Range Lookup Definition  Data for range lookup: -1000|0|Low -10000|-1000|Medium -1000000|-10000|High Interval Return range value Interval Inclusivity  Notes Interval range › Only first match is returned -> order of data matters › null value in range definition means “unlimited” • Data to match everything: ||the rest17 All rights reserved Javlin 2011
  18. 18. B7. Complete Graph Section Step B7. What level of risk do the debtors impose? Key points: Use range lookup to create risk level intervals Use Reformat and lookup() to perform search18 All rights reserved Javlin 2011

×