• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to verify the reference day
 

Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to verify the reference day

on

  • 645 views

Recipes of Data Warehouse and Business Intelligence.

Recipes of Data Warehouse and Business Intelligence.
Staging Area: How to verify the reference day of the data source

Statistics

Views

Total Views
645
Views on SlideShare
128
Embed Views
517

Actions

Likes
0
Downloads
5
Comments
0

27 Embeds 517

http://massimocenci.blogspot.it 257
http://massimocenci.blogspot.com 130
http://massimocenci.blogspot.co.uk 31
http://massimocenci.blogspot.in 14
http://massimocenci.blogspot.ca 13
http://massimocenci.blogspot.nl 10
http://massimocenci.blogspot.com.au 10
http://massimocenci.blogspot.be 9
http://massimocenci.blogspot.de 8
http://massimocenci.blogspot.co.il 5
http://massimocenci.blogspot.dk 4
http://massimocenci.blogspot.com.ar 3
http://massimocenci.blogspot.hk 3
http://massimocenci.blogspot.co.nz 2
http://massimocenci.blogspot.fr 2
http://massimocenci.blogspot.gr 2
http://massimocenci.blogspot.com.br 2
http://massimocenci.blogspot.fi 2
http://massimocenci.blogspot.ru 2
http://massimocenci.blogspot.hu 1
http://massimocenci.blogspot.mx 1
http://massimocenci.blogspot.ie 1
http://massimocenci.blogspot.com.es 1
http://massimocenci.blogspot.ae 1
http://massimocenci.blogspot.kr 1
http://massimocenci.blogspot.se 1
http://massimocenci.blogspot.tw 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to verify the reference day Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to verify the reference day Presentation Transcript

    • Recipes of Data Warehouse and Business Intelligence Staging Area: How to verify the reference day of the data source
    • The Micro ETL Foundation • • • • • The Micro ETL Foundation is a set of ideas and solutions for Data Warehouse and Business Intelligence Projects in Oracle environment. It doesn’t use expensive ETL tools, but only your intelligence and ability to think, configure, build and load data using the features and the programming language of your RDBMS. This recipes is another easy example based on the slides of Recipes 1 and 2 of Data Warehouse and Business Intelligence. Copying the content of the following slides with your editor and SQL Interface utility, you can reproduce this example. The solution presented here is the check of the reference day of a data source. This is another little component to permit us to achive the complete control of the Staging Area loading.
    • The problem • In the «Recipe 3» we have seen the check about the rows number. We have ensured this: You have received, for example, 943 rows from the data source file. You have loaded 943 rows in the Staging Area table. • This is not enough. • Now we need to ensure that the reference day (DAY_KEY) of the data is correct. • Suppose that you start the loading of account balances file now: wednesday at 01:00. You expect that the reference day of the balances is yesterday, tuesday. The loading day is today, the expected day is today-1. • If now is monday, and there is no process during week-end, the expected day is today-3, friday
    • What to check • At the end of the loading, we need to control that it is gone all ok. We need to ensure that in this loading day, the expected day is correct. To have this safety, we must: 1. Build a configuration table. 2. Initialize the configuration table with the expected day for every loading day 3. Create a log table 4. Fill the log table at the end of the load of Staging Area. • Without this check we risk to reload the same day if the source system had some problems or it has not been able to overwrite the previous file.
    • The daily configuration table • • • • • • • IO_COD is the same of the configuration table created in «Recipes2». It is unique. DAY_KEY is the loading day. DY_TXT is the day of the week WD_FLG is the flag of working day. FROM_EDAY_KEY/TO_EDAY_KEY are the expected range of day. For daily data source file they will have the same value. To have a range can be useful for monthly data files that contains all the days of the month. Now we see how to configure 3 years. DROP TABLE STA_IODAY_CFT; CREATE TABLE STA_IODAY_CFT ( IO_COD VARCHAR2(12), DAY_KEY NUMBER, DY_TXT VARCHAR2(3), WD_FLG NUMBER, FROM_EDAY_KEY NUMBER, TO_EDAY_KEY NUMBER );
    • The load of daily table • • • The load of this table may be done only one time. In this example we configure a daily loading (excluding week-end) from a year in the future (sysdate+365) and 2 years in the past. Configure holidays according to your country calendar. Here are configured only Christmas Day, New Year's Day and Independence Day. INSERT INTO STA_IODAY_CFT (IO_COD,DAY_KEY,DY_TXT,WD_FLG,FROM_EDAY_KEY,TO_EDAY_KEY) WITH X AS ( SELECT TO_CHAR((SYSDATE+365)-LEVEL,'YYYYMMDD') AS DAY_KEY ,TO_CHAR((SYSDATE+365)-LEVEL,'dy') DY_TXT FROM DUAL CONNECT BY LEVEL <= (365*3)) ,Y AS ( SELECT TO_CHAR((SYSDATE+365)-LEVEL,'YYYYMMDD') AS DAY_KEY ,FIRST_VALUE(TO_CHAR((SYSDATE+365)-LEVEL,'YYYYMMDD')) OVER (ORDER BY TO_CHAR((SYSDATE+365)-LEVEL,'YYYYMMDD') ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS EDAY_KEY FROM DUAL WHERE TO_CHAR((SYSDATE+365)-LEVEL,'dy') NOT IN ('sun','sat') AND SUBSTR(TO_CHAR(SYSDATE+365-LEVEL,'YYYYMMDD'),5) NOT IN ('0101','0704','1225') CONNECT BY LEVEL <= (365*3) ORDER BY 1 DESC) SELECT 'employees1' ,X.DAY_KEY,X.DY_TXT ,(CASE WHEN (EDAY_KEY IS NULL) THEN 0 ELSE 1 END) ,Y.EDAY_KEY,Y.EDAY_KEY FROM X LEFT OUTER JOIN Y ON (X.DAY_KEY=Y.DAY_KEY) ORDER BY 1,2 DESC;
    • The daily log table • • • • This table must contain a row for every data source file. The DAY_KEY is the loading day The others day columns are the expected reference day and the effective reference day. The RET_COD contains the result code of the check and will be ‘OK ‘ or ‘NOT OK’. DROP TABLE STA_IODAY_LOT; CREATE TABLE STA_IODAY_LOT ( IO_COD VARCHAR2(12) NOT NULL, SOURCE_COD VARCHAR2(80) NOT NULL, DAY_KEY NUMBER, FROM_EDAY_KEY NUMBER, TO_EDAY_KEY NUMBER, FROM_DAY_KEY NUMBER, TO_DAY_KEY NUMBER, RET_COD VARCHAR2(30), STAMP_DTS DATE );
    • The load of the log table • • • We use the test case of the «Recipe 2» and the data loaded into STA_EMPLOYEES1_STT table. (so it has an old day_key) We calculate the min and max day_key from the Staging Area table. We compare the received day(s) with the expected day(s) INSERT INTO STA_IODAY_LOT ( IO_COD, SOURCE_COD, DAY_KEY, FROM_EDAY_KEY, TO_EDAY_KEY, FROM_DAY_KEY, TO_DAY_KEY, RET_COD, STAMP_DTS) WITH X AS ( SELECT SOURCE_COD ,TO_CHAR(SYSDATE,'YYYYMMDD') DAY_KEY ,MIN(DAY_KEY) FROM_DAY_KEY ,MAX(DAY_KEY) TO_DAY_KEY FROM STA_EMPLOYEES1_STT GROUP BY SOURCE_COD) SELECT A.IO_COD ,X.SOURCE_COD ,X.DAY_KEY ,A.FROM_EDAY_KEY,A.TO_EDAY_KEY ,X.FROM_DAY_KEY,X.TO_DAY_KEY ,(CASE WHEN (A.FROM_EDAY_KEY+A.TO_EDAY_KEY=X.FROM_DAY_KEY+X.TO_DAY_KEY) THEN 'OK' ELSE 'NOT OK' END) ,SYSDATE FROM STA_IODAY_CFT A LEFT OUTER JOIN X ON (A.DAY_KEY = X.DAY_KEY) WHERE A.IO_COD = 'employees1' AND A.DAY_KEY = TO_CHAR(SYSDATE,'YYYYMMDD'); COMMIT;
    • Conclusion We are at the end of this recipe. The configuration and the log tables are: With only two tables, we have reached the control of the reference day of the source data without ETL tools. This is the philosophy of Micro ETL Foundation. Email - massimo_cenci@yahoo.it Blog (italian/english) - http://massimocenci.blogspot.it/