Data Warehouse and Business Intelligence - Recipe 3

6,360 views

Published on

How to check the staging area loading

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
6,360
On SlideShare
0
From Embeds
0
Number of Embeds
5,321
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Warehouse and Business Intelligence - Recipe 3

  1. 1. Recipes of Data Warehouse and Business Intelligence How to check the Staging Area Loading
  2. 2. The Micro ETL Foundation • • • • • The Micro ETL Foundation is a set of ideas and solutions for Data Warehouse and Business Intelligence Projects in Oracle environment. It doesn’t use expensive ETL tools, but only your intelligence and ability to think, configure, build and load data using the features and the programming language of your RDBMS. This recipes is another easy example based on the slides of Recipes 1 and 2 of Data Warehouse and Business Intelligence. Copying the content of the following slides with your editor and SQL Interface utility, you can reproduce this example. The solution presented here is the check of Staging area loading
  3. 3. The load of data file • • • • Configure and load the source data file according to the slides of «Recipes 2 of Data Warehouse and Business Intelligence». Copy the SQL statement in a file. Run the script and you will load a Staging Table with a «click» Now we will see how to verify the load process. The data source file is the following EMPLOYEE_ID FIRST_NAME 117 Sigal 118 Guy 119 Karen 120 Matthew 121 Adam 122 Payam 123 Shanta 124 Kevin 125 Julia 126 Irene LAST_NAME Tobias Himuro Colmenares Weiss Fripp Kaufling Vollman Mourgos Nayer Mikkilineni EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID STOBIAS 5.151.274.564 24/07/2005 PU_CLERK 2800 114 30 GHIMURO 5.151.274.565 15/11/2006 PU_CLERK 2600 114 30 KCOLMENA 5.151.274.566 10/08/2007 PU_CLERK 2500 114 30 MWEISS 6.501.231.234 18/07/2004 ST_MAN 8000 100 50 AFRIPP 6.501.232.234 10/04/2005 ST_MAN 8200 100 50 PKAUFLIN 6.501.233.234 01/05/2003 ST_MAN 7900 100 50 SVOLLMAN 6.501.234.234 10/10/2005 ST_MAN 6500 100 50 KMOURGOS 6.501.235.234 16/11/2007 ST_MAN 5800 100 50 JNAYER 6.501.241.214 16/07/2005 ST_CLERK 3200 120 50 IMIKKILI 6.501.241.224 28/09/2006 ST_CLERK 2700 120 50
  4. 4. The load process • The objects involved in the process are showned in the next figure. 1 File Sytem Row External Table (RXT) Source External View (FXV) Load Source Data File 2 Configuration External Table (CXT) File Definition Table (CFT) Source External Table (FXT) Row File 3 Configuration External View (CXV) 4 Staging Table (STT) 5
  5. 5. What to check • At the end of the loading, we need to control that it is gone all ok. We need to ensure that the rows number in the Staging table is correct. To have this safety, we must show that: 1. 2. 3. 4. 5. The rows number declared in the .row file The rows number in the source data file The rows number in the external table that refers to the data file The rows number of the view builded on the external table The rows number of the staging table Are all exactly the same. • Now see what we need.
  6. 6. The detail check table • • • • • • • • Build a check table to contain the result of the checks IO_COD is the same of the configuration table created in «Recipes2». SEQ_NUM is a global sequential number got from an Oracle sequence. SOURCE_COD is the name of the data file SORT_NUM is a sort number inside the io_cod CHECK_DET is a description of the check N1_VAL is the rows counter STAMP_DTS is the sysdate DROP TABLE STA_CHK_LOT; CREATE TABLE STA_CHK_LOT ( IO_COD VARCHAR2(12) NOT NULL, SEQ_NUM NUMBER NOT NULL, SOURCE_COD VARCHAR2(24) NOT NULL, SORT_NUM NUMBER NOT NULL, CHECK_DET VARCHAR2(600) NOT NULL, N1_VAL NUMBER NOT NULL, STAMP_DTS DATE NOT NULL ); DROP SEQUENCE STA_CHK_SEQ; CREATE SEQUENCE STA_CHK_SEQ START WITH 1 INCREMENT BY 1;
  7. 7. The summary check table • • • • • Build a summary check table to contain in only one row the result of the previous table. IO_COD is the same of the configuration table created in «Recipes2». *_CNT is the rows number got from the 5 checks showed in the slide 4. RET_COD will be the final result (OK or NOT OK) STAMP_DTS is the sysdate DROP TABLE STA_IO_LOT; CREATE TABLE STA_IO_LOT ( IO_COD VARCHAR2(12) NOT NULL, SOURCE_COD VARCHAR2(80) NOT NULL, DEC_CNT NUMBER, FIL_CNT NUMBER, FXT_CNT NUMBER, FXV_CNT NUMBER, STT_CNT NUMBER, RET_COD varchar2(30), STAMP_DTS DATE );
  8. 8. The count rows function • • • • At this point I need to write some pl/sql code. You can write it also in java or other programming language. This function count the number of lines in the source data file. It has 2 parameters: the folder (Oracle directory) and the file name. It is all. Now we can load the two check tables. CREATE OR REPLACE FUNCTION F_COUNT_FILE_ROWS( P_DIR VARCHAR2 ,P_FILE_NAME VARCHAR2 ) RETURN NUMBER IS V_F UTL_FILE.FILE_TYPE; V_COUNT NUMBER; V_LINE VARCHAR2(2000); BEGIN V_COUNT := 0; V_F := UTL_FILE.FOPEN(P_DIR, P_FILE_NAME, 'R'); LOOP UTL_FILE.GET_LINE(V_F, V_LINE); V_COUNT := V_COUNT+1; END LOOP; UTL_FILE.FCLOSE(V_F); EXCEPTION WHEN NO_DATA_FOUND THEN UTL_FILE.FCLOSE(V_F); RETURN V_COUNT; END; /
  9. 9. The declared rows • • • Insert this number with the following SQL statement. It use the Oracle dictionary to find the file name. It use the source external view to calculate the number INSERT INTO STA_CHK_LOT ( IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS) VALUES ('employees1' ,STA_CHK_SEQ.NEXTVAL ,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT') ,1 ,'DECLARED' ,(SELECT NVL(MAX(ROWS_NUM),0) FROM STA_EMPLOYEES1_FXV) ,SYSDATE );
  10. 10. The file rows • • • Insert this number with the following SQL statement. It use the Oracle dictionary to find the file name. It use the function to calculate the number INSERT INTO STA_CHK_LOT ( IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS) VALUES ('employees1' ,STA_CHK_SEQ.NEXTVAL ,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT') ,2 ,'FILE' ,NVL(F_COUNT_FILE_ROWS('STA_BCK', (SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT')),0) ,SYSDATE );
  11. 11. The external table rows • • • Insert this number with the following SQL statement. It use the Oracle dictionary to find the file name. It use the external table to calculate the number INSERT INTO STA_CHK_LOT ( IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS) VALUES ('employees1' ,STA_CHK_SEQ.NEXTVAL ,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT') ,3 ,'EXTERNAL TABLE (STA_EMPLOYEES1_FXT)' ,(SELECT NVL(COUNT(*),0) FROM STA_EMPLOYEES1_FXT) ,SYSDATE );
  12. 12. The external view rows • • • Insert this number with the following SQL statement. It use the Oracle dictionary to find the file name. It use the external view and the configuration table to calculate the number INSERT INTO STA_CHK_LOT ( IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS) VALUES ('employees1' ,STA_CHK_SEQ.NEXTVAL ,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT') ,4 ,'EXTERNAL VIEW (STA_EMPLOYEES1_FXV)' ,(SELECT NVL(COUNT(*),0) FROM STA_EMPLOYEES1_FXV)+(SELECT HEAD_CNT+FOO_CNT FROM STA_IO_CFT WHERE IO_COD = 'employees1') ,SYSDATE );
  13. 13. The staging table rows • • • Insert this number with the following SQL statement. It use the Oracle dictionary to find the file name. It use the staging table and the configuration table to calculate the number INSERT INTO STA_CHK_LOT ( IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS) VALUES ('employees1' ,STA_CHK_SEQ.NEXTVAL ,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT') ,5 ,'STAGING TABLE (STA_EMPLOYEES1_STT)' ,(SELECT NVL(COUNT(*),0) FROM STA_EMPLOYEES1_STT)+(SELECT HEAD_CNT+FOO_CNT FROM STA_IO_CFT WHERE IO_COD = 'employees1') ,SYSDATE );
  14. 14. The summary check • • • Insert the summary check with the following SQL statement. It use the detail table. It use an Oracle 11g analytics function (but you can use something else) INSERT INTO STA_IO_LOT ( IO_COD, SOURCE_COD, DEC_CNT, FIL_CNT, FXT_CNT, FXV_CNT, STT_CNT,RET_COD, STAMP_DTS) SELECT IO_COD, SOURCE_COD, DEC_CNT,FIL_CNT, FXT_CNT, FXV_CNT, STT_CNT ,(CASE WHEN (DEC_CNT=FIL_CNT AND FIL_CNT=FXT_CNT AND FXT_CNT=FXV_CNT AND FXV_CNT=STT_CNT) THEN 'OK' ELSE 'NOT OK' END) ,SYSDATE FROM (SELECT IO_COD,SOURCE_COD,SORT_NUM,N1_VAL FROM STA_CHK_LOT WHERE SOURCE_COD = (SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT')) PIVOT ( SUM(N1_VAL) FOR SORT_NUM IN ( 1 AS DEC_CNT, 2 AS FIL_CNT, 3 AS FXT_CNT, 4 AS FXV_CNT, 5 AS STT_CNT) ); COMMIT;
  15. 15. Conclusion We are at the end of this recipe. The final result of the two check tables are: With only two log tables, a function and some SQL statement we have reached the control of a Staging Area table loading, without ETL tools. This is the philosophy of Micro ETL Foundation. Email - massimo_cenci@yahoo.it Blog (italian/english) - http://massimocenci.blogspot.it/

×