Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse

How to have the monitoring of the days on the
data files of a Data Warehouse
Recipes of Data Warehouse and Business Intelligence
Are you the right one ? Have you what I
expect ?
Have you lossed
some piece ?
DATA
FILE

• In this article we focus on the management of the loading day of the data file, the
reference day of the data, and the expected number of rows. These issues have
already been covered briefly in some of my previous articles published on
slideshare and on my blog. Now we see the practical application.
• How real case, we will use, as an example, the data file of MTF markets
(Multilateral Trading Facilities). To the data file has been associated a "row" file that
contains, within it, the number of rows expected in the data file itself.
• The control file, created by hand to this end, is composed of three lines:
#MTF CONTROL FILE OF 20160314
ROWS = 160
#END OF MTF CONTROL FILE OF 20160314
• We suppose that the data file should arrive every working day, and the reference
day is the previous working day.
• The reference day is specified in the file name, but we must be careful, because the
feeding system sets, as reference, the day of production of the data file and not the
previous working day.
The use case

• Based on the information mentioned above, to
get the full control of the data file loading, the
ETL system should provide me all the
information necessary to fulfill the following
requirements.
• We must have a clear vision of what are the
characteristics of the data file, both general
and purely technical nature. In particular,
those linked to its name, the file structure, the
way it is defined the reference day, the
structure of the control file (if present)
• So, we will define the temporal characteristics
of the data file by using a code that represents
its management.
The control requirements

• For convenience, I summarize the ways in which the feeding system can tell me the
reference day.
A column of
data file
Inside the
data file
Where is the
reference
day of data ?
In the heading
of data file
In the tail of
data file
In the name
of data file
Missing, assume
the system date
Outside the
data file

• We must have a clear vision of what is the internal structure of the data file, ie
what are the columns that constitute it. And for each column must be present as
many as possible metadata.
• Both static, such as the type or length, that dynamic, as the presence of a domain
of values, or if the column is part of the unique key.

• We must have a calendar table, that,
for each calendar day, tell me, simply
duplicating the day, if I expect the
arrival of the data file and what is the
expected reference day in the data file
of that day.
• If the data file contains more days, I
need to know what is the range of days
that I expect.

• We need to know the final outcome of the processing. The final state and the time
taken. If the upload has had problems, I need to know the error produced, and what
is the programming module that generated it.
• If the outcome is negative, we have to know exactly why you are in error. For
example, if the consistency check has failed, I need to know at what point it
occurred.

• We need to know the final outcome of the control about the loading day and the
reference day.
• To get the final outcome of the controls, we have to think about implementing a
control logic similar to that shown in the next figure.
• Dark green definitely the correct situations. In red, the alert situations. In light green,
the ones presumably correct but that require attention.

1 – OK
(arrived and right day)
Expected day = reference
day ?
It had
to arrive ?
Data file
is arrived ?
2 - NOT OK
( arrived but wrong day)
3 - OK
(unespected file)
4 - NOT OK
(unespected file and
wrong day)
5 - OK
(maybe file)
6 - NOT OK
(maybe file and wrong
day)
7 - NOT OK
(missing file)
8 – OK
(no file to load)
9 - OK
(maybe file)
day ?
day ?
It had
to arrive ?
yes
no
maybe
yes
no
maybe
yes
yes
yes
yes
no
no
no
no

• We must have via e-mail the result of processing.
• Using the Micro ETL Foundation we can handle this situation and its control in a few
steps.
MEF:
Open the link:
https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k
Go to the Mef_v2 folder and follow the instructions of the readme file.
The data file is in the folder .. dat and is called mtf_export_20160314.csv. The control file with the expected number of rows is
called mtf_export_20160314.row.
It is present in the .. dat
The file that configures the data file fields is located in the .. cft and is called mtf.csv

The configuration of the data and control file
• The first step is to insert into a configuration table, which we will call IO_CFT for
brevity, all the information that we know about the features of the data file that we
load. Also, for this case, you need to enter in the IO_CFT table also information
relating to the control file.
• The second step is to insert in the IO_CFT table, the information relative to the
expected day of arrival of the data file. We must define a code, let's call FR_COD (File
Reference Code) behind which there will be the load logic of a second configuration
table that we will call IODAY_CFT. The FR_COD code represents the arrival frequency.
For the moment, I have defined some commonly used values :
• AD = Every day. It means that the data file must arrive every day. So, in
IODAY_CFT table, they will be setted all the days.
• AWD = All working days. It means that the data file must only arrive on the
working days. So all holidays most Saturdays and Sundays will be null.
• ? = I do not know when it comes, it is variable. Typical of monthly flows of which
no one knows precisely when available.
• Based on the FR_COD code, the IODAY_CFT table will be loaded, by setting the
presence of the expected day in the FR_YMD field.

Reference day configuration
• The third step is to insert in the IO_CFT table, information relating to the expected
reference day.
• The DR_COD code must indicate what should be the reference day for data in the
data file. I remember that the reference day must be present or implied. The same
logic has been applied to FR_COD field also applies to DR_COD field. It will serve to
set the IODAY_CFT. For the moment I have defined some commonly used values:
• 0 = the reference date coincides with the current day.
• 1 = the reference date coincides with the day before, that is, the current -1
• 1W = indicates the first preceding business day.
• The configuration tasks of the IODAY_CFT table occurs only once in the process of
the data file configuration. After, you no longer need to change.
• Note that the use of the codes is a way to quickly facilitate the setting of the
IODAY_CFT table. Nobody blocks you, to manually modify the table or with ad-hoc
SQL.

Configuration of the correction factor
• The OFF_COD code present in IO_CFT indicates the correction factor to be applied to
the reference day indicated by the feeding system. The OFF_COD does not act in
control, but will act as a corrector of the day at run-time. For the moment I have
defined some commonly used codes:
• 0 = the reference day coincides with the day indicated by the feeding system.
• 1 = the reference day coincides with the day before, that is, the current -1
• 1W = the reference date coincides with the previous working day.
• The FROM_DR_YMD and TO_DR_YMD fields have the same meaning of the FR_COD
field, but allow you to identify a range of possible reference days. For the moment,
only one code has been defined
• PM = the previous month of the current calendar day.
MEF:
The data file is in the folder .. dat and is called mtf_export_20160314.csv.
The control file with the expected number of rows is called mtf_export_20160314.row. It is present in the .. dat
The file that configures the file data field structure is located in the .. cft and is called mtf.csv
The configuration file of the data file is called io_mtf.txt and is under the folder .. cft. It has the following settings:

The configuration file
IO_COD: MTF (file identificator)
IO_DEB: Multilateral Trading Facilities (file description)
TYPE_COD: FIN (file type - input file)
SEC_COD: ESM (feeding system: ESMA)
FRQ_COD: D (frequency - Daily)
FILE_LIKE_TXT: mtf_export% .csv (generic name of the file without day)
FILE_EXT_TXT: mtf_export_20160314.csv (name of the sample data file)
HOST_NC:., (Priority on the decimal point)
HEAD_CNT: 1 (number of rows in header)
FOO_CNT: 0 (number of rows in tail)
SEP_TXT :, (separator symbol if csv)
START_NUM: 12 (starting character of the day in the name)
SIZE_NUM: 8 (size of day)
RROW_NUM: 2 (row of the control file in which there is the file rows number)
RSTART_NUM: 8 (where begins the number of rows)
RSIZE_NUM: 6 (size of the number)
MASK_TXT: YYYYMMDD (format of the day)
FR_COD: AWD (file reference code)
DR_COD: 1W (day reference code)
OFF_COD: 1W (offset on day reference)
RCF_LIKE_TXT: mtf_export% .row (generic name of control file without day)
RCF_EXT_TXT: mtf_export_20160314.row (name of the sample control file)
FTB_TXT: NEWLINE (indicator of the row end for the Oracle external table)
TRUNC_COD: 1 (indicating whether the staging table should be truncated before loading)
NOTE_IO_COD: MTF (presence of a notes file)

The configuration file
MEF:
The DR_COD code is managed by the mef_sta_build.p_dr_cod function
The FR_COD code is managed by the mef_sta_build.p_fr_cod function
The OFF_COD code is managed by mef_sta.f_off_cod function. See further detail in Recipe 12 on Slideshare
The functions that handle the day range are mef_sta_build.p_from_dr_cod and mef_sta_build.p_to_dr_cod.
In this way, by changing the functions we can define other codes. The mef_sta_build.p_objday_cft will load the IODAY_CFT table.
The complete configuration of the data file is done by launching the procedure
SQL> @sta_conf_io MTF

The data file loading
• The process of loading of the data file, must insert in a log table the information
related to the elaboration day and to the reference day received from the feeding
system.
MEF:
SQL> exec mef_job.p_run('sta_esm_mtf');
• Comparing, at the end of loading, what is configured with what is loaded, we can
infer a final outcome of the process. This comparison may be displayed by means of
a view which we will call IODAY_CFV.
• The logic with which works the view was summarized in a previous figure. On the
basis of this outcome, it must be agreed upon an intervention strategy.
• In our example, launched on a working day, we see that there is a problem related to
the reference day.
• Also there is another problem to be investigated: the number of rows declared in the
control file is different from the number of rows loaded.

Conclusion
• Whatever way we implement an ETL solution, the important point to emphasize is
that we need to know before, the time characteristics of the data file that we will
load.
• For each calendar day, we must have clear what I expect to receive on that day and,
for any given data file, what is the reference day that I expect to find inside.
• There can be no doubt or ambiguity: is information that we need to know in advance
and we have to configure. After the loading of the Staging Area, only the comparison
between what we expected to receive with what we actually received, will allow us
to evaluate the correctness of the loaded data.
• It ' just remember that this correctness check is a priority, is the first check, and it
refers only to the two time components of the data. Only if these checks are positive,
it will make sense to continue with the other quality controls.

References
On Slideshare:
the series: Recipes of Data Warehouse and Business Intelligence.
Blog:
http://microetlfoundation.blogspot.it
http://massimocenci.blogspot.it/
Micro ETL Foundation free source at:
https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k
Last version v2.
Email:
massimo_cenci@yahoo.it

Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (7)

Similar to Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse

Similar to Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse (20)

More from Massimo Cenci

More from Massimo Cenci (12)

Recently uploaded

Recently uploaded (20)

Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse