CloverETL Training Sample

Training course goals
• Develop and support solutions based on CloverETL
technology
– Compose and debug transformations in CloverETL Designer
– Connect to any number of data sources or sinks (files,
databases, web/cloud…)
– Detect and react to errors in the data
– Use CloverETL Server to process large amounts of data
– Design and develop Job Flows to manage complex processes
– Support existing solutions using CloverETL

1 © 2013 Javlin; All rights reserved

Course modules: basics
1. CloverETL introduction
CloverETL product family, basic terminology

2. First steps in CloverETL Designer
Building transformations with basic components, reading and writing data

3. Error handling
Properly handling input data errors and runtime errors

4. Common components and CTL programming
Commonly used components and business rule development in CTL

5. Databases
Connecting to databases and using them as data sources and targets


Course modules: advanced
6. Structured data
Handling complex data formats like XML or JSON, using web services

7. Advanced graph design
Complex transformation components, Java transformations and more

8. CloverETL Server
Introduction to CloverETL Server, its user interface and execution environment

9. Jobflows
Building jobflows to manage your processes on CloverETL Server

10. Advanced CloverETL Server
Advanced graph scheduling, using Launch Services and CloverETL Cluster


CloverETL product family
• CloverETL is a whole family of products
– Support for broad range of usage scenarios
– Purely Java-based – supported on many operating systems
• Windows, *nix, Linux, Mac

CloverETL CloverETL
Designer Server

CloverETL Engine
Java


Metadata
• Metadata describe record structure and format
– Required for each edge used in the graph to define the format of the data flowing through
that edge

• Structure defines fields in the record
– Unique (within record) field names
– Data types to determine type of information which can be stored in the record.
– Flat structure – no nesting is allowed

• Format defines rules for data input and output
– Format of the record: delimited, fixed-length or mixed
• Delimiters only apply when working with files
– Parsing rules for readers and formatting rules for writers
• Special formatting for numbers, date fields, …


Metadata types and fields
• Record type determines how to find the fields
– Delimited: fields are separated by delimiters
– Fixed-length: each fields has predefined number of
characters
– Mixed: both types of fields in single record Transaction
1 transactionId long

• Fields can be of various types 2 accountNumber long

– Numeric: integer, long, number, decimal 3 transactionType string

– Text: string 4 amount decimal(20, 3)

– Boolean values: boolean 5 timestamp date
– Date and time: date
– Other: byte, cbyte
– Containers: list or map of a primitive type


Field ordering matters
• Ordering of the fields is very important
– For parsing
– For output formatting
• Data is read/written in the same order in which Transaction
the fields are defined. 1 transactionId long

2 accountNumber long

3 transactionType string

4 amount decimal(20, 3)

5 timestamp date

1340817132,3293200814,D,59.940,20100102125243
1340817156,5357054331,C,6.720,20100116080136
1340817746,4270100470,D,194.920,20100323100706


Reformat and CTL code
• Transformation in Reformat can be written directly in CTL
without using Visual Mode
– Use all CTL features: control structures, error handling, logging…
– Write comments explaining the complex parts of the code
• Editor supports syntax highlighting, autocomplete and on-
the-fly code validation


Reformat code workflow
Called during component initialization init

Begin

Called before the first record is processed. preExecute

Main part of the transformation. Called

Next record
once for each input record.
backransform
t back transformOnError
Return value determines which port (if Error
any) receives the result.
Called only if transform caused an error.

Called after the last record is processed
immediately before component finishes.
postExecute

End

Data denormalization
Account
Original data accountId customerId balance created closed
Multiple records grouped
based on the key. 9804568699 27345 2300.56 2011-11-14
1108193472 27345 -1739.05 2005-07-22
6054951154 27345 4500.60 2009-09-01 2010-04-30
9459175447 27345 3200.80 2011-03-08

Denormalize

CustomerAccount
Denormalized data s
Single record containing values customerId totalBalance accounts
determined by processing the
27345 8262.91 [9804568699, 1108193472, 6054951154, 9459175447]
whole input group.


Denormalizer
• Converts data into denormalized form
– Combine multiple records in a group into one output record
– Output usually uses different metadata

• Required configuration
– Transformation code
• Only CTL can be used, visual mode is not available
– Grouping
• Group can be defined based on a key or group size
• If key is used, data has to be sorted


Denormalize code workflow
init

Begin
transform and append have their own
error handler. Each handler interrupts the
preExecute group and resumes processing as if the
group was processed as a whole.
append is called once for each record in a

Next record
group. It is typically used to update global
back append back appendOnError
variables which are then used in transform
function.

transform is called once per group and is
Error
the only function which generates output backransform
t back transformOnError
records.
Next group

clean is called after each transform and
can be used to clean-up internal variables.
back clean back

postExecute

End

CloverETL Training Sample

More Related Content

Similar to CloverETL Training Sample

CloverETL Training Sample

Editor's Notes