CloverETL Training Sample

7,870 views

Published on

A sample from CloverETL training available on-site or online

Published in: Devices & Hardware
  • Be the first to comment

  • Be the first to like this

CloverETL Training Sample

  1. 1. Training course goals • Develop and support solutions based on CloverETL technology – Compose and debug transformations in CloverETL Designer – Connect to any number of data sources or sinks (files, databases, web/cloud…) – Detect and react to errors in the data – Use CloverETL Server to process large amounts of data – Design and develop Job Flows to manage complex processes – Support existing solutions using CloverETL1 © 2013 Javlin; All rights reserved
  2. 2. Course modules: basics 1. CloverETL introduction CloverETL product family, basic terminology 2. First steps in CloverETL Designer Building transformations with basic components, reading and writing data 3. Error handling Properly handling input data errors and runtime errors 4. Common components and CTL programming Commonly used components and business rule development in CTL 5. Databases Connecting to databases and using them as data sources and targets2 © 2013 Javlin; All rights reserved
  3. 3. Course modules: advanced 6. Structured data Handling complex data formats like XML or JSON, using web services 7. Advanced graph design Complex transformation components, Java transformations and more 8. CloverETL Server Introduction to CloverETL Server, its user interface and execution environment 9. Jobflows Building jobflows to manage your processes on CloverETL Server 10. Advanced CloverETL Server Advanced graph scheduling, using Launch Services and CloverETL Cluster3 © 2013 Javlin; All rights reserved
  4. 4. CloverETL product family • CloverETL is a whole family of products – Support for broad range of usage scenarios – Purely Java-based – supported on many operating systems • Windows, *nix, Linux, Mac CloverETL CloverETL Designer Server CloverETL Engine Java4 © 2013 Javlin; All rights reserved
  5. 5. Metadata • Metadata describe record structure and format – Required for each edge used in the graph to define the format of the data flowing through that edge • Structure defines fields in the record – Unique (within record) field names – Data types to determine type of information which can be stored in the record. – Flat structure – no nesting is allowed • Format defines rules for data input and output – Format of the record: delimited, fixed-length or mixed • Delimiters only apply when working with files – Parsing rules for readers and formatting rules for writers • Special formatting for numbers, date fields, …5 © 2013 Javlin; All rights reserved
  6. 6. Metadata types and fields • Record type determines how to find the fields – Delimited: fields are separated by delimiters – Fixed-length: each fields has predefined number of characters – Mixed: both types of fields in single record Transaction 1 transactionId long • Fields can be of various types 2 accountNumber long – Numeric: integer, long, number, decimal 3 transactionType string – Text: string 4 amount decimal(20, 3) – Boolean values: boolean 5 timestamp date – Date and time: date – Other: byte, cbyte – Containers: list or map of a primitive type6 © 2013 Javlin; All rights reserved
  7. 7. Field ordering matters • Ordering of the fields is very important – For parsing – For output formatting • Data is read/written in the same order in which Transaction the fields are defined. 1 transactionId long 2 accountNumber long 3 transactionType string 4 amount decimal(20, 3) 5 timestamp date 1340817132,3293200814,D,59.940,20100102125243 1340817156,5357054331,C,6.720,20100116080136 1340817746,4270100470,D,194.920,201003231007067 © 2013 Javlin; All rights reserved
  8. 8. Reformat and CTL code • Transformation in Reformat can be written directly in CTL without using Visual Mode – Use all CTL features: control structures, error handling, logging… – Write comments explaining the complex parts of the code • Editor supports syntax highlighting, autocomplete and on- the-fly code validation8 © 2013 Javlin; All rights reserved
  9. 9. Reformat code workflow Called during component initialization init Begin Called before the first record is processed. preExecute Main part of the transformation. Called Next record once for each input record. backransform t back transformOnError Return value determines which port (if Error any) receives the result. Called only if transform caused an error. Called after the last record is processed immediately before component finishes. postExecute End9 © 2013 Javlin; All rights reserved
  10. 10. Data denormalization Account Original data accountId customerId balance created closed Multiple records grouped based on the key. 9804568699 27345 2300.56 2011-11-14 1108193472 27345 -1739.05 2005-07-22 6054951154 27345 4500.60 2009-09-01 2010-04-30 9459175447 27345 3200.80 2011-03-08 Denormalize CustomerAccount Denormalized data s Single record containing values customerId totalBalance accounts determined by processing the 27345 8262.91 [9804568699, 1108193472, 6054951154, 9459175447] whole input group.10 © 2013 Javlin; All rights reserved
  11. 11. Denormalizer • Converts data into denormalized form – Combine multiple records in a group into one output record – Output usually uses different metadata • Required configuration – Transformation code • Only CTL can be used, visual mode is not available – Grouping • Group can be defined based on a key or group size • If key is used, data has to be sorted11 © 2013 Javlin; All rights reserved
  12. 12. Denormalize code workflow init Begin transform and append have their own error handler. Each handler interrupts the preExecute group and resumes processing as if the group was processed as a whole. append is called once for each record in a Next record group. It is typically used to update global back append back appendOnError variables which are then used in transform function. transform is called once per group and is Error the only function which generates output backransform t back transformOnError records. Next group clean is called after each transform and can be used to clean-up internal variables. back clean back postExecute End12 © 2013 Javlin; All rights reserved

×