SlideShare a Scribd company logo
1 of 18
Download to read offline
Basic Training Course
       for CloverETL software

                                                  Training teaser ―
                                  excerpt from Basic Training Course


All rights reserved Javlin 2011
Training Course Documentation

           This presentations accompanies the training course
            delivery
           It can serve as a baseline for self-study
           The course focuses on fundamentals of CloverETL
            platform which are needed for graph development
            and management
           This document includes additional topics which are
            intended to be used as introductions to more
            advanced concepts and techniques
           The additional topics are not a formal part of the
            course; they may or may not be referenced during the
            class time depending on factors such as time
            constraints and project relevance
2   All rights reserved Javlin 2011
Training Course Objectives

           On successful completion of this course you will be
                able to:

                    › Develop solutions to business problems using CloverETL
                      platform
                    › Compose graphs using Designer and Engine components
                    › Describe data formats with metadata definitions
                    › Access data from multiple sources including files and
                      databases
                    › Detect and react to errors in data
                    › Optimize your existing graphs
                    › Deploy and manage graphs in CloverETL Server
                      environment
3   All rights reserved Javlin 2011
Agenda

           DAY 1

           Introduction

           Basic Principles

           Getting Started

           Designer Walkthrough

           Transaction Analysis




4   All rights reserved Javlin 2011
Agenda

           DAY 2

           Graphs for Real World

           Customer Profile Analysis

           Lookups: Searching in Data




5   All rights reserved Javlin 2011
Agenda

           DAY 3

           Database Datasources

           Working with Structured Data

           XML input/output

           Final Review

           Test

           Q&A




6   All rights reserved Javlin 2011
B6. Task Discussion

          Sometimes data need to be enriched with referential
          information:
           Who are the debtors?

           Steps:
                    › Find customers identifiers who have negative personal
                      balance
                    › Look up details for all such customers – first and last name.

           How:
                    › Use lookup tables to prepare the data for searching
                    › Use LookupJoin component to search the table


7   All rights reserved Javlin 2011
Lookup Tables

           Lookup tables are data structures that allow fast
           searches over data

           Simple lookup is a hash table in memory
           Database lookup is a database table with local cache
           Range lookup allows performing range queries
                    › “Is the value A in range <10,20> or (20,100> ?”

           Persistent lookup uses index files to search data
           Aspell lookup allows similarity search over strings
                    › “Find matches for keyword ‘car’”. “Bar, card, cars”

8   All rights reserved Javlin 2011
Lookup Table Structure

           Data stored in lookup tables has the following
           structure:

           Search key
                    › One or multiple fields


           Return value
                    › Returned when a match with key is found
                    › Some tables allow storing duplicate keys
                    › More than one match can be found



9   All rights reserved Javlin 2011
Populating Lookup Tables

            Data for a lookup table can be provided by several means:

            Manual data entry
                     › Data are part of lookup table definition

            File reference
                     › Table definition contains URL of the input file
                     › Metadata describe format of input file
                     › Simple parsing

            Dynamic population
                     › Designated component for writing into lookup files
                     › Data can be created dynamically by a graph



10   All rights reserved Javlin 2011
Using Lookup Tables

            Lookup tables are reusable and can be accessed from
                 all reformat-like components.

            Reduce the size of the lookup by reducing record
                 width and including only applicable records in it.

            Lookup table must fit into memory or the graph will
                 fail
                     › does not apply to database and persistent lookups

            Comparable to Hash Join in performance

            Offer more flexibility than joiners for partial matching

11   All rights reserved Javlin 2011
Component LookupTableReaderWriter

            The component can read or write contents of a
                 lookup tables of any type

            Use lookup table to:
                     › Dynamically populate lookup table with data
                     › Prepare the data for lookup when advanced parsing is
                       needed
                     › Dump lookup table into file or database

            Found in the Others section of Component Palette

            To configure the component, you need to provide:
                     › Target lookup table


12   All rights reserved Javlin 2011
B6. Complete Graph Section

            Step B6. Populate lookup table with data

            Key points:
              Use Simple lookup table type
              Drop unnecessary fields prior to loading into table.
              Split the graph into two phases, 0 and 1.




13   All rights reserved Javlin 2011
Component LookupJoin

            LookupJoin component searching a lookup table for
                 match with records from regular data flow.

            Use lookup table to:
                     › Search any kind of lookup table for a match.
                     › Find records that did not have any match
                     › Comfortably handle multiple matches

            Found in the Joiners section of Component Palette

            To configure the component, you need to provide:
                     › Lookup table
                     › Joining key


14   All rights reserved Javlin 2011
B6. Complete Graph Section

            Step B5. Populate lookup table with data

            Key points:
              Use ExtFilter to find customers with negative
            balance.
              Use LookupJoin to search lookup table




15   All rights reserved Javlin 2011
B7. Task Discussion

           Range queries can be used to group similar records:
            What level of risk do the debtors impose?

            Steps:
                     › Use three risk levels: low, medium, high
                     › Risk level is assigned based on amount of money owed


            How:
                     › Use range lookup table to accommodate the range query
                     › Use lookup(<table_name>).get() to search the table from
                       transformation code


16   All rights reserved Javlin 2011
Range Lookup Definition

             Data for range lookup:
            -1000|0|Low
            -10000|-1000|Medium
            -1000000|-10000|High


                              Interval          Return
                              range             value
                                                                        Interval
                                                                        Inclusivity

            Notes                                                      Interval
                                                                        range

                     › Only first match is returned -> order of data matters
                     › null value in range definition means “unlimited”
                              • Data to match everything:
                                ||the rest

17   All rights reserved Javlin 2011
B7. Complete Graph Section

            Step B7. What level of risk do the debtors impose?

            Key points:
              Use range lookup to create risk level intervals
              Use Reformat and lookup() to perform search




18   All rights reserved Javlin 2011

More Related Content

What's hot

Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongMassimo Cenci
 
Day 2 Data Stage Manager 11.0
Day 2 Data Stage Manager 11.0Day 2 Data Stage Manager 11.0
Day 2 Data Stage Manager 11.0kshanmug2
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overviewStreamHorizon
 
final_proj_Implementation of the ETL system
final_proj_Implementation of the ETL systemfinal_proj_Implementation of the ETL system
final_proj_Implementation of the ETL systemR-uturaj R-aval
 
Data stage scenario design 2 - job1
Data stage scenario   design 2 - job1Data stage scenario   design 2 - job1
Data stage scenario design 2 - job1Naresh Bala
 
Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014OSSCube
 
Migration Approaches for FDMEE
Migration Approaches for FDMEEMigration Approaches for FDMEE
Migration Approaches for FDMEEAlithya
 
Introduction of Oracle
Introduction of Oracle Introduction of Oracle
Introduction of Oracle Salman Memon
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352sflynn073
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSyed Hadoop
 
Pankaj_Kumar_3 yr exp _ETL
Pankaj_Kumar_3  yr exp _ETL Pankaj_Kumar_3  yr exp _ETL
Pankaj_Kumar_3 yr exp _ETL Kumar Pankaj
 

What's hot (17)

Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
 
Day 2 Data Stage Manager 11.0
Day 2 Data Stage Manager 11.0Day 2 Data Stage Manager 11.0
Day 2 Data Stage Manager 11.0
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data WarehousingDatastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
CloverETL and IBM Infosphere MDM partners and users
CloverETL and IBM Infosphere MDM partners and usersCloverETL and IBM Infosphere MDM partners and users
CloverETL and IBM Infosphere MDM partners and users
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
final_proj_Implementation of the ETL system
final_proj_Implementation of the ETL systemfinal_proj_Implementation of the ETL system
final_proj_Implementation of the ETL system
 
FDMEE Custom Reports
FDMEE Custom ReportsFDMEE Custom Reports
FDMEE Custom Reports
 
Data stage scenario design 2 - job1
Data stage scenario   design 2 - job1Data stage scenario   design 2 - job1
Data stage scenario design 2 - job1
 
Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014
 
Migration Approaches for FDMEE
Migration Approaches for FDMEEMigration Approaches for FDMEE
Migration Approaches for FDMEE
 
58750024 datastage-student-guide
58750024 datastage-student-guide58750024 datastage-student-guide
58750024 datastage-student-guide
 
Introduction of Oracle
Introduction of Oracle Introduction of Oracle
Introduction of Oracle
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
 
Pankaj_Kumar_3 yr exp _ETL
Pankaj_Kumar_3  yr exp _ETL Pankaj_Kumar_3  yr exp _ETL
Pankaj_Kumar_3 yr exp _ETL
 

Similar to CloverETL Basic Training Excerpt

Sizing Your Software: A Fast Path Approach
Sizing Your Software: A Fast Path ApproachSizing Your Software: A Fast Path Approach
Sizing Your Software: A Fast Path ApproachDCG Software Value
 
Inb343 week2 sql server intro
Inb343 week2 sql server introInb343 week2 sql server intro
Inb343 week2 sql server introFredlive503
 
Vision Reporting - Configuration Tips
Vision Reporting - Configuration TipsVision Reporting - Configuration Tips
Vision Reporting - Configuration TipsSysco Software
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part isqlserver.co.il
 
Top 20 something info path 2010 tips and trips - sps-ozarks12
Top 20 something info path 2010 tips and trips - sps-ozarks12Top 20 something info path 2010 tips and trips - sps-ozarks12
Top 20 something info path 2010 tips and trips - sps-ozarks12Kevin Dostalek
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 
​Abinitio online training
​Abinitio online training​Abinitio online training
​Abinitio online trainingonlineitguru369
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cMaria Colgan
 
Oracle Business Intelligence Enterprise Edition
Oracle Business Intelligence Enterprise EditionOracle Business Intelligence Enterprise Edition
Oracle Business Intelligence Enterprise EditionESRI Bulgaria
 
Data Analysis tool by EBA
Data Analysis tool by EBAData Analysis tool by EBA
Data Analysis tool by EBAebaykal
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...Databricks
 
Java enterprise development framework
Java enterprise development frameworkJava enterprise development framework
Java enterprise development frameworkPavel Suvorov
 
Sage 300 ERP: Technical Tour of Diagnostic Tools
Sage 300 ERP: Technical Tour of Diagnostic ToolsSage 300 ERP: Technical Tour of Diagnostic Tools
Sage 300 ERP: Technical Tour of Diagnostic ToolsSage 300 ERP CS
 
Getting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning GuideGetting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning GuideIntel IT Center
 

Similar to CloverETL Basic Training Excerpt (20)

Sizing Your Software: A Fast Path Approach
Sizing Your Software: A Fast Path ApproachSizing Your Software: A Fast Path Approach
Sizing Your Software: A Fast Path Approach
 
Inb343 week2 sql server intro
Inb343 week2 sql server introInb343 week2 sql server intro
Inb343 week2 sql server intro
 
Vision Reporting - Configuration Tips
Vision Reporting - Configuration TipsVision Reporting - Configuration Tips
Vision Reporting - Configuration Tips
 
Chapter 5
Chapter 5Chapter 5
Chapter 5
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part i
 
Presentation2
Presentation2Presentation2
Presentation2
 
Presentation2
Presentation2Presentation2
Presentation2
 
Top 20 something info path 2010 tips and trips - sps-ozarks12
Top 20 something info path 2010 tips and trips - sps-ozarks12Top 20 something info path 2010 tips and trips - sps-ozarks12
Top 20 something info path 2010 tips and trips - sps-ozarks12
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
​Abinitio online training
​Abinitio online training​Abinitio online training
​Abinitio online training
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
 
project report
project reportproject report
project report
 
Oracle Business Intelligence Enterprise Edition
Oracle Business Intelligence Enterprise EditionOracle Business Intelligence Enterprise Edition
Oracle Business Intelligence Enterprise Edition
 
Data Analysis tool by EBA
Data Analysis tool by EBAData Analysis tool by EBA
Data Analysis tool by EBA
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
 
Java enterprise development framework
Java enterprise development frameworkJava enterprise development framework
Java enterprise development framework
 
Sage 300 ERP: Technical Tour of Diagnostic Tools
Sage 300 ERP: Technical Tour of Diagnostic ToolsSage 300 ERP: Technical Tour of Diagnostic Tools
Sage 300 ERP: Technical Tour of Diagnostic Tools
 
Getting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning GuideGetting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning Guide
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

CloverETL Basic Training Excerpt

  • 1. Basic Training Course for CloverETL software Training teaser ― excerpt from Basic Training Course All rights reserved Javlin 2011
  • 2. Training Course Documentation  This presentations accompanies the training course delivery  It can serve as a baseline for self-study  The course focuses on fundamentals of CloverETL platform which are needed for graph development and management  This document includes additional topics which are intended to be used as introductions to more advanced concepts and techniques  The additional topics are not a formal part of the course; they may or may not be referenced during the class time depending on factors such as time constraints and project relevance 2 All rights reserved Javlin 2011
  • 3. Training Course Objectives  On successful completion of this course you will be able to: › Develop solutions to business problems using CloverETL platform › Compose graphs using Designer and Engine components › Describe data formats with metadata definitions › Access data from multiple sources including files and databases › Detect and react to errors in data › Optimize your existing graphs › Deploy and manage graphs in CloverETL Server environment 3 All rights reserved Javlin 2011
  • 4. Agenda DAY 1  Introduction  Basic Principles  Getting Started  Designer Walkthrough  Transaction Analysis 4 All rights reserved Javlin 2011
  • 5. Agenda DAY 2  Graphs for Real World  Customer Profile Analysis  Lookups: Searching in Data 5 All rights reserved Javlin 2011
  • 6. Agenda DAY 3  Database Datasources  Working with Structured Data  XML input/output  Final Review  Test  Q&A 6 All rights reserved Javlin 2011
  • 7. B6. Task Discussion Sometimes data need to be enriched with referential information:  Who are the debtors? Steps: › Find customers identifiers who have negative personal balance › Look up details for all such customers – first and last name. How: › Use lookup tables to prepare the data for searching › Use LookupJoin component to search the table 7 All rights reserved Javlin 2011
  • 8. Lookup Tables Lookup tables are data structures that allow fast searches over data  Simple lookup is a hash table in memory  Database lookup is a database table with local cache  Range lookup allows performing range queries › “Is the value A in range <10,20> or (20,100> ?”  Persistent lookup uses index files to search data  Aspell lookup allows similarity search over strings › “Find matches for keyword ‘car’”. “Bar, card, cars” 8 All rights reserved Javlin 2011
  • 9. Lookup Table Structure Data stored in lookup tables has the following structure:  Search key › One or multiple fields  Return value › Returned when a match with key is found › Some tables allow storing duplicate keys › More than one match can be found 9 All rights reserved Javlin 2011
  • 10. Populating Lookup Tables Data for a lookup table can be provided by several means:  Manual data entry › Data are part of lookup table definition  File reference › Table definition contains URL of the input file › Metadata describe format of input file › Simple parsing  Dynamic population › Designated component for writing into lookup files › Data can be created dynamically by a graph 10 All rights reserved Javlin 2011
  • 11. Using Lookup Tables  Lookup tables are reusable and can be accessed from all reformat-like components.  Reduce the size of the lookup by reducing record width and including only applicable records in it.  Lookup table must fit into memory or the graph will fail › does not apply to database and persistent lookups  Comparable to Hash Join in performance  Offer more flexibility than joiners for partial matching 11 All rights reserved Javlin 2011
  • 12. Component LookupTableReaderWriter  The component can read or write contents of a lookup tables of any type  Use lookup table to: › Dynamically populate lookup table with data › Prepare the data for lookup when advanced parsing is needed › Dump lookup table into file or database  Found in the Others section of Component Palette  To configure the component, you need to provide: › Target lookup table 12 All rights reserved Javlin 2011
  • 13. B6. Complete Graph Section Step B6. Populate lookup table with data Key points: Use Simple lookup table type Drop unnecessary fields prior to loading into table. Split the graph into two phases, 0 and 1. 13 All rights reserved Javlin 2011
  • 14. Component LookupJoin  LookupJoin component searching a lookup table for match with records from regular data flow.  Use lookup table to: › Search any kind of lookup table for a match. › Find records that did not have any match › Comfortably handle multiple matches  Found in the Joiners section of Component Palette  To configure the component, you need to provide: › Lookup table › Joining key 14 All rights reserved Javlin 2011
  • 15. B6. Complete Graph Section Step B5. Populate lookup table with data Key points: Use ExtFilter to find customers with negative balance. Use LookupJoin to search lookup table 15 All rights reserved Javlin 2011
  • 16. B7. Task Discussion Range queries can be used to group similar records:  What level of risk do the debtors impose? Steps: › Use three risk levels: low, medium, high › Risk level is assigned based on amount of money owed How: › Use range lookup table to accommodate the range query › Use lookup(<table_name>).get() to search the table from transformation code 16 All rights reserved Javlin 2011
  • 17. Range Lookup Definition  Data for range lookup: -1000|0|Low -10000|-1000|Medium -1000000|-10000|High Interval Return range value Interval Inclusivity  Notes Interval range › Only first match is returned -> order of data matters › null value in range definition means “unlimited” • Data to match everything: ||the rest 17 All rights reserved Javlin 2011
  • 18. B7. Complete Graph Section Step B7. What level of risk do the debtors impose? Key points: Use range lookup to create risk level intervals Use Reformat and lookup() to perform search 18 All rights reserved Javlin 2011