SlideShare a Scribd company logo
How to have the monitoring of the days on the
data files of a Data Warehouse
Recipes of Data Warehouse and Business Intelligence
Are you the right one ? Have you what I
expect ?
Have you lossed
some piece ?
DATA
FILE
• In this article we focus on the management of the loading day of the data file, the
reference day of the data, and the expected number of rows. These issues have
already been covered briefly in some of my previous articles published on
slideshare and on my blog. Now we see the practical application.
• How real case, we will use, as an example, the data file of MTF markets
(Multilateral Trading Facilities). To the data file has been associated a "row" file that
contains, within it, the number of rows expected in the data file itself.
• The control file, created by hand to this end, is composed of three lines:
#MTF CONTROL FILE OF 20160314
ROWS = 160
#END OF MTF CONTROL FILE OF 20160314
• We suppose that the data file should arrive every working day, and the reference
day is the previous working day.
• The reference day is specified in the file name, but we must be careful, because the
feeding system sets, as reference, the day of production of the data file and not the
previous working day.
The use case
• Based on the information mentioned above, to
get the full control of the data file loading, the
ETL system should provide me all the
information necessary to fulfill the following
requirements.
• We must have a clear vision of what are the
characteristics of the data file, both general
and purely technical nature. In particular,
those linked to its name, the file structure, the
way it is defined the reference day, the
structure of the control file (if present)
• So, we will define the temporal characteristics
of the data file by using a code that represents
its management.
The control requirements
• For convenience, I summarize the ways in which the feeding system can tell me the
reference day.
The control requirements
A column of
data file
Inside the
data file
Where is the
reference
day of data ?
In the heading
of data file
In the tail of
data file
In the name
of data file
Missing, assume
the system date
Outside the
data file
• We must have a clear vision of what is the internal structure of the data file, ie
what are the columns that constitute it. And for each column must be present as
many as possible metadata.
• Both static, such as the type or length, that dynamic, as the presence of a domain
of values, or if the column is part of the unique key.
The control requirements
The control requirements
• We must have a calendar table, that,
for each calendar day, tell me, simply
duplicating the day, if I expect the
arrival of the data file and what is the
expected reference day in the data file
of that day.
• If the data file contains more days, I
need to know what is the range of days
that I expect.
The control requirements
• We need to know the final outcome of the processing. The final state and the time
taken. If the upload has had problems, I need to know the error produced, and what
is the programming module that generated it.
• If the outcome is negative, we have to know exactly why you are in error. For
example, if the consistency check has failed, I need to know at what point it
occurred.
The control requirements
• We need to know the final outcome of the control about the loading day and the
reference day.
• To get the final outcome of the controls, we have to think about implementing a
control logic similar to that shown in the next figure.
• Dark green definitely the correct situations. In red, the alert situations. In light green,
the ones presumably correct but that require attention.
The control requirements
1 – OK
(arrived and right day)
Expected day = reference
day ?
It had
to arrive ?
Data file
is arrived ?
2 - NOT OK
( arrived but wrong day)
3 - OK
(unespected file)
4 - NOT OK
(unespected file and
wrong day)
5 - OK
(maybe file)
6 - NOT OK
(maybe file and wrong
day)
7 - NOT OK
(missing file)
8 – OK
(no file to load)
9 - OK
(maybe file)
Expected day = reference
day ?
Expected day = reference
day ?
It had
to arrive ?
yes
no
maybe
yes
no
maybe
yes
yes
yes
yes
no
no
no
no
The control requirements
• We must have via e-mail the result of processing.
• Using the Micro ETL Foundation we can handle this situation and its control in a few
steps.
MEF:
Open the link:
https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k
Go to the Mef_v2 folder and follow the instructions of the readme file.
The data file is in the folder .. dat and is called mtf_export_20160314.csv. The control file with the expected number of rows is
called mtf_export_20160314.row.
It is present in the .. dat
The file that configures the data file fields is located in the .. cft and is called mtf.csv
The configuration of the data and control file
• The first step is to insert into a configuration table, which we will call IO_CFT for
brevity, all the information that we know about the features of the data file that we
load. Also, for this case, you need to enter in the IO_CFT table also information
relating to the control file.
• The second step is to insert in the IO_CFT table, the information relative to the
expected day of arrival of the data file. We must define a code, let's call FR_COD (File
Reference Code) behind which there will be the load logic of a second configuration
table that we will call IODAY_CFT. The FR_COD code represents the arrival frequency.
For the moment, I have defined some commonly used values :
• AD = Every day. It means that the data file must arrive every day. So, in
IODAY_CFT table, they will be setted all the days.
• AWD = All working days. It means that the data file must only arrive on the
working days. So all holidays most Saturdays and Sundays will be null.
• ? = I do not know when it comes, it is variable. Typical of monthly flows of which
no one knows precisely when available.
• Based on the FR_COD code, the IODAY_CFT table will be loaded, by setting the
presence of the expected day in the FR_YMD field.
Reference day configuration
• The third step is to insert in the IO_CFT table, information relating to the expected
reference day.
• The DR_COD code must indicate what should be the reference day for data in the
data file. I remember that the reference day must be present or implied. The same
logic has been applied to FR_COD field also applies to DR_COD field. It will serve to
set the IODAY_CFT. For the moment I have defined some commonly used values:
• 0 = the reference date coincides with the current day.
• 1 = the reference date coincides with the day before, that is, the current -1
• 1W = indicates the first preceding business day.
• The configuration tasks of the IODAY_CFT table occurs only once in the process of
the data file configuration. After, you no longer need to change.
• Note that the use of the codes is a way to quickly facilitate the setting of the
IODAY_CFT table. Nobody blocks you, to manually modify the table or with ad-hoc
SQL.
Configuration of the correction factor
• The OFF_COD code present in IO_CFT indicates the correction factor to be applied to
the reference day indicated by the feeding system. The OFF_COD does not act in
control, but will act as a corrector of the day at run-time. For the moment I have
defined some commonly used codes:
• 0 = the reference day coincides with the day indicated by the feeding system.
• 1 = the reference day coincides with the day before, that is, the current -1
• 1W = the reference date coincides with the previous working day.
• The FROM_DR_YMD and TO_DR_YMD fields have the same meaning of the FR_COD
field, but allow you to identify a range of possible reference days. For the moment,
only one code has been defined
• PM = the previous month of the current calendar day.
MEF:
The data file is in the folder .. dat and is called mtf_export_20160314.csv.
The control file with the expected number of rows is called mtf_export_20160314.row. It is present in the .. dat
The file that configures the file data field structure is located in the .. cft and is called mtf.csv
The configuration file of the data file is called io_mtf.txt and is under the folder .. cft. It has the following settings:
The configuration file
IO_COD: MTF (file identificator)
IO_DEB: Multilateral Trading Facilities (file description)
TYPE_COD: FIN (file type - input file)
SEC_COD: ESM (feeding system: ESMA)
FRQ_COD: D (frequency - Daily)
FILE_LIKE_TXT: mtf_export% .csv (generic name of the file without day)
FILE_EXT_TXT: mtf_export_20160314.csv (name of the sample data file)
HOST_NC:., (Priority on the decimal point)
HEAD_CNT: 1 (number of rows in header)
FOO_CNT: 0 (number of rows in tail)
SEP_TXT :, (separator symbol if csv)
START_NUM: 12 (starting character of the day in the name)
SIZE_NUM: 8 (size of day)
RROW_NUM: 2 (row of the control file in which there is the file rows number)
RSTART_NUM: 8 (where begins the number of rows)
RSIZE_NUM: 6 (size of the number)
MASK_TXT: YYYYMMDD (format of the day)
FR_COD: AWD (file reference code)
DR_COD: 1W (day reference code)
OFF_COD: 1W (offset on day reference)
RCF_LIKE_TXT: mtf_export% .row (generic name of control file without day)
RCF_EXT_TXT: mtf_export_20160314.row (name of the sample control file)
FTB_TXT: NEWLINE (indicator of the row end for the Oracle external table)
TRUNC_COD: 1 (indicating whether the staging table should be truncated before loading)
NOTE_IO_COD: MTF (presence of a notes file)
The configuration file
MEF:
The DR_COD code is managed by the mef_sta_build.p_dr_cod function
The FR_COD code is managed by the mef_sta_build.p_fr_cod function
The OFF_COD code is managed by mef_sta.f_off_cod function. See further detail in Recipe 12 on Slideshare
The functions that handle the day range are mef_sta_build.p_from_dr_cod and mef_sta_build.p_to_dr_cod.
In this way, by changing the functions we can define other codes. The mef_sta_build.p_objday_cft will load the IODAY_CFT table.
The complete configuration of the data file is done by launching the procedure
SQL> @sta_conf_io MTF
The data file loading
• The process of loading of the data file, must insert in a log table the information
related to the elaboration day and to the reference day received from the feeding
system.
MEF:
SQL> exec mef_job.p_run('sta_esm_mtf');
• Comparing, at the end of loading, what is configured with what is loaded, we can
infer a final outcome of the process. This comparison may be displayed by means of
a view which we will call IODAY_CFV.
• The logic with which works the view was summarized in a previous figure. On the
basis of this outcome, it must be agreed upon an intervention strategy.
• In our example, launched on a working day, we see that there is a problem related to
the reference day.
• Also there is another problem to be investigated: the number of rows declared in the
control file is different from the number of rows loaded.
Conclusion
• Whatever way we implement an ETL solution, the important point to emphasize is
that we need to know before, the time characteristics of the data file that we will
load.
• For each calendar day, we must have clear what I expect to receive on that day and,
for any given data file, what is the reference day that I expect to find inside.
• There can be no doubt or ambiguity: is information that we need to know in advance
and we have to configure. After the loading of the Staging Area, only the comparison
between what we expected to receive with what we actually received, will allow us
to evaluate the correctness of the loaded data.
• It ' just remember that this correctness check is a priority, is the first check, and it
refers only to the two time components of the data. Only if these checks are positive,
it will make sense to continue with the other quality controls.
References
On Slideshare:
the series: Recipes of Data Warehouse and Business Intelligence.
Blog:
http://microetlfoundation.blogspot.it
http://massimocenci.blogspot.it/
Micro ETL Foundation free source at:
https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k
Last version v2.
Email:
massimo_cenci@yahoo.it

More Related Content

What's hot

Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Massimo Cenci
 
Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1
Massimo Cenci
 
Data Warehouse and Business Intelligence - Recipe 3
Data Warehouse and Business Intelligence - Recipe 3Data Warehouse and Business Intelligence - Recipe 3
Data Warehouse and Business Intelligence - Recipe 3
Massimo Cenci
 
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Massimo Cenci
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questions
Naveen P
 
Working with the IFS on System i
Working with the IFS on System iWorking with the IFS on System i
Working with the IFS on System i
Chuck Walker
 
Sql introduction
Sql introductionSql introduction
Sql introduction
vimal_guru
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
Richie Rump
 
Top 100 SQL Interview Questions and Answers
Top 100 SQL Interview Questions and AnswersTop 100 SQL Interview Questions and Answers
Top 100 SQL Interview Questions and Answers
iimjobs and hirist
 
Sql loader good example
Sql loader good exampleSql loader good example
Sql loader good example
Aneel Swarna MBA ,PMP
 
Oracle sql loader utility
Oracle sql loader utilityOracle sql loader utility
Oracle sql loader utility
nageswarareddapps
 
Apps1
Apps1Apps1
Multiple files single target single interface
Multiple files single target single interfaceMultiple files single target single interface
Multiple files single target single interface
Dharmaraj Borse
 
Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2
nesmaddy
 
Dbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.comDbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.com
Stephenson23
 
Convert language latin1 to utf8 on mysql
Convert language latin1 to utf8 on mysqlConvert language latin1 to utf8 on mysql
Convert language latin1 to utf8 on mysql
Vasudeva Rao
 
MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017
Dave Stokes
 
SQL2SPARQL
SQL2SPARQLSQL2SPARQL
SQL2SPARQL
Alexandru Dron
 

What's hot (18)

Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
 
Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1
 
Data Warehouse and Business Intelligence - Recipe 3
Data Warehouse and Business Intelligence - Recipe 3Data Warehouse and Business Intelligence - Recipe 3
Data Warehouse and Business Intelligence - Recipe 3
 
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questions
 
Working with the IFS on System i
Working with the IFS on System iWorking with the IFS on System i
Working with the IFS on System i
 
Sql introduction
Sql introductionSql introduction
Sql introduction
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
 
Top 100 SQL Interview Questions and Answers
Top 100 SQL Interview Questions and AnswersTop 100 SQL Interview Questions and Answers
Top 100 SQL Interview Questions and Answers
 
Sql loader good example
Sql loader good exampleSql loader good example
Sql loader good example
 
Oracle sql loader utility
Oracle sql loader utilityOracle sql loader utility
Oracle sql loader utility
 
Apps1
Apps1Apps1
Apps1
 
Multiple files single target single interface
Multiple files single target single interfaceMultiple files single target single interface
Multiple files single target single interface
 
Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2
 
Dbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.comDbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.com
 
Convert language latin1 to utf8 on mysql
Convert language latin1 to utf8 on mysqlConvert language latin1 to utf8 on mysql
Convert language latin1 to utf8 on mysql
 
MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017
 
SQL2SPARQL
SQL2SPARQLSQL2SPARQL
SQL2SPARQL
 

Viewers also liked

May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
Adam Muise
 
Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging area
Massimo Cenci
 
Basic of Oracle Application
Basic of Oracle ApplicationBasic of Oracle Application
Basic of Oracle Application
Girishchandra Darvesh
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etl
Massimo Cenci
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
Rob Winters
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
Mark Ginnebaugh
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 

Viewers also liked (7)

May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging area
 
Basic of Oracle Application
Basic of Oracle ApplicationBasic of Oracle Application
Basic of Oracle Application
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etl
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 

Similar to Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse

Final Project Write-up
Final Project Write-upFinal Project Write-up
Final Project Write-up
shiyang feng
 
Assignment of database
Assignment of databaseAssignment of database
Assignment of database
ra na
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...
aasifkuchey85
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Massimo Cenci
 
Database Design
Database DesignDatabase Design
Database Design
learnt
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiences
Cidar Mendizabal
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
obieefans
 
Replace this Line with the Title of Your Paper.docx
Replace this Line with the Title of Your Paper.docxReplace this Line with the Title of Your Paper.docx
Replace this Line with the Title of Your Paper.docx
debishakespeare
 
CIS 336 STUDY Introduction Education--cis336study.com
CIS 336 STUDY Introduction Education--cis336study.comCIS 336 STUDY Introduction Education--cis336study.com
CIS 336 STUDY Introduction Education--cis336study.com
claric262
 
ETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse FundamentalsETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse Fundamentals
SOMASUNDARAM T
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
smile790243
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
obieefans
 
ITFT- Dbms
ITFT- DbmsITFT- Dbms
ITFT- Dbms
Blossom Sood
 
System design
System designSystem design
System design
Gheethu Joy
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
Massimo Cenci
 
algo 1.ppt
algo 1.pptalgo 1.ppt
algo 1.ppt
example43
 
essentialofdatabasedesign-141203001046-conversion-gate01.pdf
essentialofdatabasedesign-141203001046-conversion-gate01.pdfessentialofdatabasedesign-141203001046-conversion-gate01.pdf
essentialofdatabasedesign-141203001046-conversion-gate01.pdf
AlfiaAnsari2
 
Database performance management
Database performance managementDatabase performance management
Database performance management
scottaver
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) Environment
Adetula Bunmi
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
Deepak Chaurasia
 

Similar to Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse (20)

Final Project Write-up
Final Project Write-upFinal Project Write-up
Final Project Write-up
 
Assignment of database
Assignment of databaseAssignment of database
Assignment of database
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
 
Database Design
Database DesignDatabase Design
Database Design
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiences
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Replace this Line with the Title of Your Paper.docx
Replace this Line with the Title of Your Paper.docxReplace this Line with the Title of Your Paper.docx
Replace this Line with the Title of Your Paper.docx
 
CIS 336 STUDY Introduction Education--cis336study.com
CIS 336 STUDY Introduction Education--cis336study.comCIS 336 STUDY Introduction Education--cis336study.com
CIS 336 STUDY Introduction Education--cis336study.com
 
ETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse FundamentalsETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse Fundamentals
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
ITFT- Dbms
ITFT- DbmsITFT- Dbms
ITFT- Dbms
 
System design
System designSystem design
System design
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
 
algo 1.ppt
algo 1.pptalgo 1.ppt
algo 1.ppt
 
essentialofdatabasedesign-141203001046-conversion-gate01.pdf
essentialofdatabasedesign-141203001046-conversion-gate01.pdfessentialofdatabasedesign-141203001046-conversion-gate01.pdf
essentialofdatabasedesign-141203001046-conversion-gate01.pdf
 
Database performance management
Database performance managementDatabase performance management
Database performance management
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) Environment
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
 

More from Massimo Cenci

Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Massimo Cenci
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Massimo Cenci
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmer
Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Massimo Cenci
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Massimo Cenci
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Massimo Cenci
 
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlOracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisiNote di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Massimo Cenci
 

More from Massimo Cenci (12)

Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmer
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlOracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sql
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisiNote di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
 

Recently uploaded

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 

Recently uploaded (20)

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 

Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse

  • 1. How to have the monitoring of the days on the data files of a Data Warehouse Recipes of Data Warehouse and Business Intelligence Are you the right one ? Have you what I expect ? Have you lossed some piece ? DATA FILE
  • 2. • In this article we focus on the management of the loading day of the data file, the reference day of the data, and the expected number of rows. These issues have already been covered briefly in some of my previous articles published on slideshare and on my blog. Now we see the practical application. • How real case, we will use, as an example, the data file of MTF markets (Multilateral Trading Facilities). To the data file has been associated a "row" file that contains, within it, the number of rows expected in the data file itself. • The control file, created by hand to this end, is composed of three lines: #MTF CONTROL FILE OF 20160314 ROWS = 160 #END OF MTF CONTROL FILE OF 20160314 • We suppose that the data file should arrive every working day, and the reference day is the previous working day. • The reference day is specified in the file name, but we must be careful, because the feeding system sets, as reference, the day of production of the data file and not the previous working day. The use case
  • 3. • Based on the information mentioned above, to get the full control of the data file loading, the ETL system should provide me all the information necessary to fulfill the following requirements. • We must have a clear vision of what are the characteristics of the data file, both general and purely technical nature. In particular, those linked to its name, the file structure, the way it is defined the reference day, the structure of the control file (if present) • So, we will define the temporal characteristics of the data file by using a code that represents its management. The control requirements
  • 4. • For convenience, I summarize the ways in which the feeding system can tell me the reference day. The control requirements A column of data file Inside the data file Where is the reference day of data ? In the heading of data file In the tail of data file In the name of data file Missing, assume the system date Outside the data file
  • 5. • We must have a clear vision of what is the internal structure of the data file, ie what are the columns that constitute it. And for each column must be present as many as possible metadata. • Both static, such as the type or length, that dynamic, as the presence of a domain of values, or if the column is part of the unique key. The control requirements
  • 6. The control requirements • We must have a calendar table, that, for each calendar day, tell me, simply duplicating the day, if I expect the arrival of the data file and what is the expected reference day in the data file of that day. • If the data file contains more days, I need to know what is the range of days that I expect.
  • 7. The control requirements • We need to know the final outcome of the processing. The final state and the time taken. If the upload has had problems, I need to know the error produced, and what is the programming module that generated it. • If the outcome is negative, we have to know exactly why you are in error. For example, if the consistency check has failed, I need to know at what point it occurred.
  • 8. The control requirements • We need to know the final outcome of the control about the loading day and the reference day. • To get the final outcome of the controls, we have to think about implementing a control logic similar to that shown in the next figure. • Dark green definitely the correct situations. In red, the alert situations. In light green, the ones presumably correct but that require attention.
  • 9. The control requirements 1 – OK (arrived and right day) Expected day = reference day ? It had to arrive ? Data file is arrived ? 2 - NOT OK ( arrived but wrong day) 3 - OK (unespected file) 4 - NOT OK (unespected file and wrong day) 5 - OK (maybe file) 6 - NOT OK (maybe file and wrong day) 7 - NOT OK (missing file) 8 – OK (no file to load) 9 - OK (maybe file) Expected day = reference day ? Expected day = reference day ? It had to arrive ? yes no maybe yes no maybe yes yes yes yes no no no no
  • 10. The control requirements • We must have via e-mail the result of processing. • Using the Micro ETL Foundation we can handle this situation and its control in a few steps. MEF: Open the link: https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k Go to the Mef_v2 folder and follow the instructions of the readme file. The data file is in the folder .. dat and is called mtf_export_20160314.csv. The control file with the expected number of rows is called mtf_export_20160314.row. It is present in the .. dat The file that configures the data file fields is located in the .. cft and is called mtf.csv
  • 11. The configuration of the data and control file • The first step is to insert into a configuration table, which we will call IO_CFT for brevity, all the information that we know about the features of the data file that we load. Also, for this case, you need to enter in the IO_CFT table also information relating to the control file. • The second step is to insert in the IO_CFT table, the information relative to the expected day of arrival of the data file. We must define a code, let's call FR_COD (File Reference Code) behind which there will be the load logic of a second configuration table that we will call IODAY_CFT. The FR_COD code represents the arrival frequency. For the moment, I have defined some commonly used values : • AD = Every day. It means that the data file must arrive every day. So, in IODAY_CFT table, they will be setted all the days. • AWD = All working days. It means that the data file must only arrive on the working days. So all holidays most Saturdays and Sundays will be null. • ? = I do not know when it comes, it is variable. Typical of monthly flows of which no one knows precisely when available. • Based on the FR_COD code, the IODAY_CFT table will be loaded, by setting the presence of the expected day in the FR_YMD field.
  • 12. Reference day configuration • The third step is to insert in the IO_CFT table, information relating to the expected reference day. • The DR_COD code must indicate what should be the reference day for data in the data file. I remember that the reference day must be present or implied. The same logic has been applied to FR_COD field also applies to DR_COD field. It will serve to set the IODAY_CFT. For the moment I have defined some commonly used values: • 0 = the reference date coincides with the current day. • 1 = the reference date coincides with the day before, that is, the current -1 • 1W = indicates the first preceding business day. • The configuration tasks of the IODAY_CFT table occurs only once in the process of the data file configuration. After, you no longer need to change. • Note that the use of the codes is a way to quickly facilitate the setting of the IODAY_CFT table. Nobody blocks you, to manually modify the table or with ad-hoc SQL.
  • 13. Configuration of the correction factor • The OFF_COD code present in IO_CFT indicates the correction factor to be applied to the reference day indicated by the feeding system. The OFF_COD does not act in control, but will act as a corrector of the day at run-time. For the moment I have defined some commonly used codes: • 0 = the reference day coincides with the day indicated by the feeding system. • 1 = the reference day coincides with the day before, that is, the current -1 • 1W = the reference date coincides with the previous working day. • The FROM_DR_YMD and TO_DR_YMD fields have the same meaning of the FR_COD field, but allow you to identify a range of possible reference days. For the moment, only one code has been defined • PM = the previous month of the current calendar day. MEF: The data file is in the folder .. dat and is called mtf_export_20160314.csv. The control file with the expected number of rows is called mtf_export_20160314.row. It is present in the .. dat The file that configures the file data field structure is located in the .. cft and is called mtf.csv The configuration file of the data file is called io_mtf.txt and is under the folder .. cft. It has the following settings:
  • 14. The configuration file IO_COD: MTF (file identificator) IO_DEB: Multilateral Trading Facilities (file description) TYPE_COD: FIN (file type - input file) SEC_COD: ESM (feeding system: ESMA) FRQ_COD: D (frequency - Daily) FILE_LIKE_TXT: mtf_export% .csv (generic name of the file without day) FILE_EXT_TXT: mtf_export_20160314.csv (name of the sample data file) HOST_NC:., (Priority on the decimal point) HEAD_CNT: 1 (number of rows in header) FOO_CNT: 0 (number of rows in tail) SEP_TXT :, (separator symbol if csv) START_NUM: 12 (starting character of the day in the name) SIZE_NUM: 8 (size of day) RROW_NUM: 2 (row of the control file in which there is the file rows number) RSTART_NUM: 8 (where begins the number of rows) RSIZE_NUM: 6 (size of the number) MASK_TXT: YYYYMMDD (format of the day) FR_COD: AWD (file reference code) DR_COD: 1W (day reference code) OFF_COD: 1W (offset on day reference) RCF_LIKE_TXT: mtf_export% .row (generic name of control file without day) RCF_EXT_TXT: mtf_export_20160314.row (name of the sample control file) FTB_TXT: NEWLINE (indicator of the row end for the Oracle external table) TRUNC_COD: 1 (indicating whether the staging table should be truncated before loading) NOTE_IO_COD: MTF (presence of a notes file)
  • 15. The configuration file MEF: The DR_COD code is managed by the mef_sta_build.p_dr_cod function The FR_COD code is managed by the mef_sta_build.p_fr_cod function The OFF_COD code is managed by mef_sta.f_off_cod function. See further detail in Recipe 12 on Slideshare The functions that handle the day range are mef_sta_build.p_from_dr_cod and mef_sta_build.p_to_dr_cod. In this way, by changing the functions we can define other codes. The mef_sta_build.p_objday_cft will load the IODAY_CFT table. The complete configuration of the data file is done by launching the procedure SQL> @sta_conf_io MTF
  • 16. The data file loading • The process of loading of the data file, must insert in a log table the information related to the elaboration day and to the reference day received from the feeding system. MEF: SQL> exec mef_job.p_run('sta_esm_mtf'); • Comparing, at the end of loading, what is configured with what is loaded, we can infer a final outcome of the process. This comparison may be displayed by means of a view which we will call IODAY_CFV. • The logic with which works the view was summarized in a previous figure. On the basis of this outcome, it must be agreed upon an intervention strategy. • In our example, launched on a working day, we see that there is a problem related to the reference day. • Also there is another problem to be investigated: the number of rows declared in the control file is different from the number of rows loaded.
  • 17. Conclusion • Whatever way we implement an ETL solution, the important point to emphasize is that we need to know before, the time characteristics of the data file that we will load. • For each calendar day, we must have clear what I expect to receive on that day and, for any given data file, what is the reference day that I expect to find inside. • There can be no doubt or ambiguity: is information that we need to know in advance and we have to configure. After the loading of the Staging Area, only the comparison between what we expected to receive with what we actually received, will allow us to evaluate the correctness of the loaded data. • It ' just remember that this correctness check is a priority, is the first check, and it refers only to the two time components of the data. Only if these checks are positive, it will make sense to continue with the other quality controls.
  • 18. References On Slideshare: the series: Recipes of Data Warehouse and Business Intelligence. Blog: http://microetlfoundation.blogspot.it http://massimocenci.blogspot.it/ Micro ETL Foundation free source at: https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k Last version v2. Email: massimo_cenci@yahoo.it