SlideShare a Scribd company logo
1 of 14
Recipes of Data Warehouse and
Business Intelligence

Load a Data Source File (with header, footer and
fixed lenght columns) into a Staging Area table
with a click
The Micro ETL Foundation
•

•

•

The Micro ETL Foundation is a set of ideas and solutions for Data Warehouse and
Business Intelligence Projects in Oracle environment.
It doesn’t use expensive ETL tools, but only your intelligence and ability to think,
configure, build and load data using the features and the programming language
of your RDBMS.
This recipe is an easy example. Copying the content of the following slides with
your editor and SQL Interface utility, you can reproduce this example.
The source data file
•
•
•
•
•

Get the data file to load. In this recipe we use a data file with these features:
Four initial rows like header. The reference day of the data is in the first row with
the «dd/mm/yyyy» format.
One tail row with the number of records of the data file.
Columns of fixed size (we will configure later).
The next figure is the content of the data file that we call employees4.txt
BANKIN1431/12/20130000
BEGINHEADER
BANKIN1400 EMPLOYEES
ENDHEADER
100Steven King
SKING
101Neena Kochhar
NKOCHHAR
102Lex De Haan
LDEHAAN
145John Russell
JRUSSEL
146Karen Partners
KPARTNER
147Alberto Errazuriz
AERRAZUR
148Gerald Cambrault
GCAMBRAU
149Eleni Zlotkey
EZLOTKEY
150Peter Tucker
PTUCKER
BANKIN1431/12/201300000000009

5.151.234.567
5.151.234.568
5.151.234.569
011.44.1344.429268
011.44.1344.467268
011.44.1344.429278
011.44.1344.619268
011.44.1344.429018
011.44.1344.129268

17/06/2003AD_PRES
21/09/2005AD_VP
13/01/2001AD_VP
01/10/2004SA_MAN
05/01/2005SA_MAN
10/03/2005SA_MAN
15/10/2007SA_MAN
29/01/2008SA_MAN
30/01/2005SA_REP

24000
17000
17000
14000
13500
12000
11000
10500
10000

100
100
0.04100
0.03100
0.03100
0.03100
0.02100
0.03145

90
90
90
80
80
80
80
80
80
The definition file
•
•
•
•
•
•
•

Build the definition file from your documentation.
It has to be a «csv» file because it must be seen by an external table.
For this example we define the minimum set of information.
COLUMN_COD will be the name of the column in the DWH.
FXV_TXT contains little transformations to be done.
COLSIZE_NUM is the size of the column in the data file.
The next is the content of the definition file that we call employees4.csv
COLUMN_ID HOST_COLUMN_COD
1 EMPLOYEE_ID
2 FIRST_NAME
3 LAST_NAME
4 EMAIL
5 PHONE_NUMBER
6 HIRE_DATE
7 JOB_ID
8 SALARY
9 COMMISSION_PCT
10 MANAGER_ID
11 DEPARTMENT_ID

COLUMN_COD
EMPLOYEE_ID
FIRST_NAME
LAST_NAME
EMAIL
PHONE_NUMBER
HIRE_DATE
JOB_ID
SALARY
COMMISSION_PCT
MANAGER_ID
DEPARTMENT_ID

TYPE_TXT
COLSIZE_NUM FXV_TXT
NUMBER (6)
6 to_number(EMPLOYEE_ID)
VARCHAR2(20)
20
VARCHAR2(25)
25
VARCHAR2(25)
25
VARCHAR2(20)
20 replace(PHONE_NUMBER,'.','')
NUMBER
10 TO_NUMBER(to_char(to_date(HIRE_DATE,'dd/mm/yyyy'),'yyyymmdd'))
VARCHAR2(10)
10
NUMBER (8,2)
9 to_number(SALARY)
NUMBER (2,2)
4 to_number(COMMISSION_PCT,'99.99')
NUMBER (6)
6 to_number(MANAGER_ID)
NUMBER (4)
4 to_number(DEPARTMENT_ID)
The physical/logical environment
•

•
•

Create two Operating System folders. The
first for the data file and the second for
the configuration file. (C:ios and
c:ioscft)
Create some Oracle directories needed
for the external tables definition.
Position the data and the configuration
file in the folders.

DROP DIRECTORY STA_BCK;
CREATE DIRECTORY STA_BCK AS 'c:ios';
DROP DIRECTORY STA_LOG;
CREATE DIRECTORY STA_LOG AS 'c:ios';
DROP DIRECTORY STA_RCV;
CREATE DIRECTORY STA_RCV AS 'c:ios';
DROP DIRECTORY STA_CFT;
CREATE DIRECTORY STA_CFT AS 'c:ioscft';
DROP DIRECTORY STA_CFT_LOG;
CREATE DIRECTORY STA_CFT_LOG AS 'c:ioscft';
The source configuration table
•

•
•
•

•

Create the configuration table of the data
source showed in the slide 3
It contains the unique identificator of data
source (IO_ID)
It contains the folder references (*_DIR)
It contains the information about the format
of different types of data source
Only some fields will be configured.

DROP TABLE STA_IO_CFT;
CREATE TABLE STA_IO_CFT
(
IO_COD
VARCHAR2(12),
RCV_DIR
VARCHAR2(30),
BCK_DIR
VARCHAR2(30),
LOG_DIR
VARCHAR2(30),
HEAD_CNT
NUMBER,
FOO_CNT
NUMBER,
SEP_TXT
VARCHAR2(1),
IDR_NUM
NUMBER,
IDC_NUM
NUMBER,
IDS_NUM
NUMBER,
IDF_TXT
VARCHAR2(30),
EDC_NUM
NUMBER,
EDS_NUM
NUMBER,
EDF_TXT
VARCHAR2(30),
RCR_NUM
NUMBER,
RCC_NUM
NUMBER,
RCS_NUM
NUMBER,
RCF_LIKE_TXT VARCHAR2(30),
FILE_LIKE_TXT VARCHAR2(60)
);
The load of configuration table
•
•
•
•
•
•

•

Load the previous table according to features
of the slide 3:
The folders reference (rcv_dir,bck_dir,log_dir)
The name of the source file (file_like_txt)
The number of header (head_cnt) and footer
rows (foo_cnt).
The separator character (sep_txt). Null
because is not a csv file.
The position, in the header, of the reference
day of the source, and its format.
(idr_num,idc_num,ids_num,idf_txt
The offset from tail, position and size in the
footer section of the number or rows of the
source. (rcr_num,rcc_num,rcs_num)

DELETE STA_IO_CFT
WHERE IO_COD = 'employees4';
INSERT INTO STA_IO_CFT (
IO_COD
,RCV_DIR,BCK_DIR,LOG_DIR
,FILE_LIKE_TXT
,HEAD_CNT,FOO_CNT,SEP_TXT
,IDR_NUM,IDC_NUM,IDS_NUM,IDF_TXT
,RCR_NUM,RCC_NUM,RCS_NUM
)
VALUES (
'employees4'
,'STA_RCV','STA_BCK','STA_LOG'
,'employees4.txt'
,4,1,NULL
,1,9,10,'DD/MM/YYYY'
,0,19,13
);
The configuration table of the definition file
•
•
•

Create the configuration table of the
data structure showed in the slide 4
It is a metadata table
You can add others info like the column
description

DROP TABLE STA_EMPLOYEES4_CXT;
CREATE TABLE STA_EMPLOYEES4_CXT (
COLUMN_ID
VARCHAR2(4),
HOST_COLUMN_COD VARCHAR2(30),
COLUMN_COD
VARCHAR2(30),
TYPE_TXT
VARCHAR2(30),
COLSIZE_NUM
VARCHAR2(4),
FXV_TXT
VARCHAR2(200))
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY STA_CFT
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
BADFILE STA_CFT:'EMPLOYEES4.BAD'
DISCARDFILE STA_CFT:'EMPLOYEES4.DSC'
LOGFILE STA_CFT:'EMPLOYEES4.LOG'
SKIP 1 FIELDS TERMINATED BY';' LRTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS (
COLUMN_ID
,HOST_COLUMN_COD
,COLUMN_COD
,TYPE_TXT
,COLSIZE_NUM
,FXV_TXT))
LOCATION (STA_CFT:'EMPLOYEES4.CSV'))
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;
The structure configuration view
•
•

Create the structure configuration view based on the previous configuration table.
In addition, it only calculates the limits of the fixed columns of the data file using
an analytics function.

CREATE OR REPLACE VIEW STA_EMPLOYEES4_CXV AS
SELECT
COLUMN_ID
,HOST_COLUMN_COD
,COLUMN_COD
,TYPE_TXT
,COLSIZE_NUM
,FXV_TXT
,(SUM (COLSIZE_NUM) OVER (ORDER BY TO_NUMBER (COLUMN_ID))) - COLSIZE_NUM + 1 AS FROM_NUM
,SUM (COLSIZE_NUM) OVER (ORDER BY TO_NUMBER (COLUMN_ID)) AS TO_NUM
FROM STA_EMPLOYEES4_CXT
ORDER BY TO_NUMBER (COLUMN_ID);
The source external table
•
•
•

•

Create the external table linked to the source
data file.
The name and type of columns have to be
the same of the configuration view.
ROW_CNT is a useful feature of the Oracle
external table to give a numbering to every
row
ROW_TXT is the entire row without
restriction. It will be used in the following
view

DROP TABLE STA_EMPLOYEES4_FXT;
CREATE TABLE STA_EMPLOYEES4_FXT
(
EMPLOYEE_ID
VARCHAR2(11)
,FIRST_NAME
VARCHAR2(20)
,LAST_NAME
VARCHAR2(25)
,EMAIL
VARCHAR2(25)
,PHONE_NUMBER
VARCHAR2(20)
,HIRE_DATE
VARCHAR2(10)
,JOB_ID
VARCHAR2(10)
,SALARY
VARCHAR2(9)
,COMMISSION_PCT VARCHAR2(14)
,MANAGER_ID
VARCHAR2(10)
,DEPARTMENT_ID
VARCHAR2(13)
,ROW_CNT
NUMBER
,ROW_TXT
VARCHAR2(4000))
ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER
DEFAULT DIRECTORY STA_BCK
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
BADFILE STA_LOG:'EMPLOYEES2.BAD'
DISCARDFILE STA_LOG:'EMPLOYEES2.DSC'
LOGFILE STA_LOG:'EMPLOYEES2.LOG'
FIELDS TERMINATED BY '' LRTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS (
EMPLOYEE_ID POSITION(1:6)
,FIRST_NAME POSITION(7:26)
,LAST_NAME POSITION(27:51)
,EMAIL POSITION(52:76)
,PHONE_NUMBER POSITION(77:96)
,HIRE_DATE POSITION(97:106)
,JOB_ID POSITION(107:116)
,SALARY POSITION(117:125)
,COMMISSION_PCT POSITION(126:129)
,MANAGER_ID POSITION(130:135)
,DEPARTMENT_ID POSITION(136:139)
,ROW_CNT RECNUM
,ROW_TXT POSITION(1:139)))
LOCATION (STA_BCK:'employees4.txt'))
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;
The source external view (1)
•
•

The goal of the view is to prepare the data to load in the staging table.
It will use the useful SQL clause «with» to build the information needed. See in
details the single sub-query blocks.
– T1 = get the name of the source data file using a table of the Oracle dictionary
– T2 = get the reference day of the data using the info of the source definition
table
– T3 = get the declared number of rows declared in the file footer. The final -0
means that there is no offset from the tail of the file.
– T4 = get the number of rows using the row counter of the external table
– T5 = get the header/footer numbers of rows
The source external view (2)
•

The complete SQL Statement is:

CREATE OR REPLACE FORCE VIEW STA_EMPLOYEES4_FXV AS
WITH T1 AS (SELECT SUBSTR(LOCATION,1,80) SOURCE_COD
FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES4_FXT')
,T2 AS (SELECT TO_NUMBER(TO_CHAR(TO_DATE(SUBSTR(ROW_TXT,9,10),'dd/mm/yyyy'),'yyyymmdd')) DAY_KEY
FROM STA_EMPLOYEES4_FXT WHERE ROW_CNT = 1)
,T3 AS (SELECT TO_NUMBER(SUBSTR(ROW_TXT,19,13)) ROWS_NUM
FROM STA_EMPLOYEES4_FXT WHERE ROW_CNT=(SELECT MAX(ROW_CNT) FROM STA_EMPLOYEES4_FXT)-0)
,T4 AS (SELECT MAX(ROW_CNT) R FROM STA_EMPLOYEES4_FXT)
,T5 AS (SELECT HEAD_CNT X,FOO_CNT Y FROM STA_IO_CFT WHERE IO_COD = 'employees4')
SELECT TO_NUMBER(EMPLOYEE_ID) EMPLOYEE_ID
,FIRST_NAME FIRST_NAME
,LAST_NAME LAST_NAME
,EMAIL EMAIL
,REPLACE(PHONE_NUMBER,'.','') PHONE_NUMBER
,TO_NUMBER(TO_CHAR(TO_DATE(HIRE_DATE,'dd/mm/yyyy'),'yyyymmdd')) HIRE_DATE
,JOB_ID JOB_ID
,TO_NUMBER(SALARY) SALARY
,TO_NUMBER(COMMISSION_PCT,'99.99') COMMISSION_PCT
,TO_NUMBER(MANAGER_ID) MANAGER_ID
,TO_NUMBER(DEPARTMENT_ID) DEPARTMENT_ID
,SOURCE_COD
,DAY_KEY
,ROWS_NUM
FROM STA_EMPLOYEES4_FXT,T1,T2,T3,T4,T5
WHERE ROW_CNT > X AND ROW_CNT <= R-Y;
The Staging table
•

•

•

The Staging table will be loaded from
the previous view.
It has the 3 technical fields to remember
the name of the source data file, the
reference day, and the rows num.
The rows num can be avoided, (is the
same for all records) but it can be useful
for statistical checks.

DROP TABLE STA_EMPLOYEES4_STT;
CREATE TABLE STA_EMPLOYEES4_STT
(
EMPLOYEE_ID
NUMBER,
FIRST_NAME
VARCHAR2(20),
LAST_NAME
VARCHAR2(25),
EMAIL
VARCHAR2(25),
PHONE_NUMBER
VARCHAR2(20),
HIRE_DATE
NUMBER,
JOB_ID
VARCHAR2(10),
SALARY
NUMBER,
COMMISSION_PCT NUMBER,
MANAGER_ID
NUMBER,
DEPARTMENT_ID
NUMBER,
SOURCE_COD
VARCHAR2(320),
DAY_KEY
VARCHAR2(8),
ROWS_NUM
NUMBER
);
The final load
•

We are at the end of this recipes. Now we can do the final load with a simple SQL
statement

INSERT INTO STA_EMPLOYEES4_STT
SELECT * FROM STA_EMPLOYEES4_FXV;
•

I underline the following features:
– All is done without ETL Tool
– The only physical structure created in the DWH is the final staging table
– Everything is controlled by logical structures (external tables and views)
– Everything without writing any code
– If you create a SQL script from this recipe, you will load the staging table with
a click

Email - massimo_cenci@yahoo.it
Blog (italian/english) - http://massimocenci.blogspot.it/

More Related Content

What's hot

What's hot (20)

Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
 
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
 
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
 
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
 
Multiple files single target single interface
Multiple files single target single interfaceMultiple files single target single interface
Multiple files single target single interface
 
Dbms lab Manual
Dbms lab ManualDbms lab Manual
Dbms lab Manual
 
DataBase Management System Lab File
DataBase Management System Lab FileDataBase Management System Lab File
DataBase Management System Lab File
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
 
Sql loader good example
Sql loader good exampleSql loader good example
Sql loader good example
 
Oracle sql loader utility
Oracle sql loader utilityOracle sql loader utility
Oracle sql loader utility
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
 
Multiple Flat Files(CSV) to Target Table in ODI12c(12.2.1.0.0)
Multiple Flat Files(CSV) to Target Table in ODI12c(12.2.1.0.0)Multiple Flat Files(CSV) to Target Table in ODI12c(12.2.1.0.0)
Multiple Flat Files(CSV) to Target Table in ODI12c(12.2.1.0.0)
 
ODI 11g - Multiple Flat Files to Oracle DB Table by taking File Name dynamica...
ODI 11g - Multiple Flat Files to Oracle DB Table by taking File Name dynamica...ODI 11g - Multiple Flat Files to Oracle DB Table by taking File Name dynamica...
ODI 11g - Multiple Flat Files to Oracle DB Table by taking File Name dynamica...
 
Oracle notes
Oracle notesOracle notes
Oracle notes
 
MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL I
 
Sql introduction
Sql introductionSql introduction
Sql introduction
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query Optimizations
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
 
Oracle Database 12.1.0.2 New Features
Oracle Database 12.1.0.2 New FeaturesOracle Database 12.1.0.2 New Features
Oracle Database 12.1.0.2 New Features
 

Similar to Data Warehouse and Business Intelligence - Recipe 1

New Features of SQL Server 2016
New Features of SQL Server 2016New Features of SQL Server 2016
New Features of SQL Server 2016
Mir Mahmood
 
PBDJ 19-4(woolley rev)
PBDJ 19-4(woolley rev)PBDJ 19-4(woolley rev)
PBDJ 19-4(woolley rev)
Buck Woolley
 
Kåre Rude Andersen - Be a hero – optimize scom and present your services
Kåre Rude Andersen - Be a hero – optimize scom and present your servicesKåre Rude Andersen - Be a hero – optimize scom and present your services
Kåre Rude Andersen - Be a hero – optimize scom and present your services
Nordic Infrastructure Conference
 

Similar to Data Warehouse and Business Intelligence - Recipe 1 (20)

database.ppt
database.pptdatabase.ppt
database.ppt
 
Python Programming.pptx
Python Programming.pptxPython Programming.pptx
Python Programming.pptx
 
Chapter 4 Structured Query Language
Chapter 4 Structured Query LanguageChapter 4 Structured Query Language
Chapter 4 Structured Query Language
 
Sql
SqlSql
Sql
 
New Features of SQL Server 2016
New Features of SQL Server 2016New Features of SQL Server 2016
New Features of SQL Server 2016
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
 
SQL Queries Information
SQL Queries InformationSQL Queries Information
SQL Queries Information
 
PBDJ 19-4(woolley rev)
PBDJ 19-4(woolley rev)PBDJ 19-4(woolley rev)
PBDJ 19-4(woolley rev)
 
plsql Les09
 plsql Les09 plsql Les09
plsql Les09
 
Les09
Les09Les09
Les09
 
Dbms oracle
Dbms oracle Dbms oracle
Dbms oracle
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
 
Rdbms day3
Rdbms day3Rdbms day3
Rdbms day3
 
Les09 (using ddl statements to create and manage tables)
Les09 (using ddl statements to create and manage tables)Les09 (using ddl statements to create and manage tables)
Les09 (using ddl statements to create and manage tables)
 
mis4200notes4_2.ppt
mis4200notes4_2.pptmis4200notes4_2.ppt
mis4200notes4_2.ppt
 
Data Manipulation(DML) and Transaction Control (TCL)
Data Manipulation(DML) and Transaction Control (TCL)  Data Manipulation(DML) and Transaction Control (TCL)
Data Manipulation(DML) and Transaction Control (TCL)
 
Les09
Les09Les09
Les09
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
Getting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfGetting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdf
 
Kåre Rude Andersen - Be a hero – optimize scom and present your services
Kåre Rude Andersen - Be a hero – optimize scom and present your servicesKåre Rude Andersen - Be a hero – optimize scom and present your services
Kåre Rude Andersen - Be a hero – optimize scom and present your services
 

More from Massimo Cenci

Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
Massimo Cenci
 

More from Massimo Cenci (16)

Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging area
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etl
 
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmer
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlOracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sql
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisiNote di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Data Warehouse and Business Intelligence - Recipe 1

  • 1. Recipes of Data Warehouse and Business Intelligence Load a Data Source File (with header, footer and fixed lenght columns) into a Staging Area table with a click
  • 2. The Micro ETL Foundation • • • The Micro ETL Foundation is a set of ideas and solutions for Data Warehouse and Business Intelligence Projects in Oracle environment. It doesn’t use expensive ETL tools, but only your intelligence and ability to think, configure, build and load data using the features and the programming language of your RDBMS. This recipe is an easy example. Copying the content of the following slides with your editor and SQL Interface utility, you can reproduce this example.
  • 3. The source data file • • • • • Get the data file to load. In this recipe we use a data file with these features: Four initial rows like header. The reference day of the data is in the first row with the «dd/mm/yyyy» format. One tail row with the number of records of the data file. Columns of fixed size (we will configure later). The next figure is the content of the data file that we call employees4.txt BANKIN1431/12/20130000 BEGINHEADER BANKIN1400 EMPLOYEES ENDHEADER 100Steven King SKING 101Neena Kochhar NKOCHHAR 102Lex De Haan LDEHAAN 145John Russell JRUSSEL 146Karen Partners KPARTNER 147Alberto Errazuriz AERRAZUR 148Gerald Cambrault GCAMBRAU 149Eleni Zlotkey EZLOTKEY 150Peter Tucker PTUCKER BANKIN1431/12/201300000000009 5.151.234.567 5.151.234.568 5.151.234.569 011.44.1344.429268 011.44.1344.467268 011.44.1344.429278 011.44.1344.619268 011.44.1344.429018 011.44.1344.129268 17/06/2003AD_PRES 21/09/2005AD_VP 13/01/2001AD_VP 01/10/2004SA_MAN 05/01/2005SA_MAN 10/03/2005SA_MAN 15/10/2007SA_MAN 29/01/2008SA_MAN 30/01/2005SA_REP 24000 17000 17000 14000 13500 12000 11000 10500 10000 100 100 0.04100 0.03100 0.03100 0.03100 0.02100 0.03145 90 90 90 80 80 80 80 80 80
  • 4. The definition file • • • • • • • Build the definition file from your documentation. It has to be a «csv» file because it must be seen by an external table. For this example we define the minimum set of information. COLUMN_COD will be the name of the column in the DWH. FXV_TXT contains little transformations to be done. COLSIZE_NUM is the size of the column in the data file. The next is the content of the definition file that we call employees4.csv COLUMN_ID HOST_COLUMN_COD 1 EMPLOYEE_ID 2 FIRST_NAME 3 LAST_NAME 4 EMAIL 5 PHONE_NUMBER 6 HIRE_DATE 7 JOB_ID 8 SALARY 9 COMMISSION_PCT 10 MANAGER_ID 11 DEPARTMENT_ID COLUMN_COD EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID TYPE_TXT COLSIZE_NUM FXV_TXT NUMBER (6) 6 to_number(EMPLOYEE_ID) VARCHAR2(20) 20 VARCHAR2(25) 25 VARCHAR2(25) 25 VARCHAR2(20) 20 replace(PHONE_NUMBER,'.','') NUMBER 10 TO_NUMBER(to_char(to_date(HIRE_DATE,'dd/mm/yyyy'),'yyyymmdd')) VARCHAR2(10) 10 NUMBER (8,2) 9 to_number(SALARY) NUMBER (2,2) 4 to_number(COMMISSION_PCT,'99.99') NUMBER (6) 6 to_number(MANAGER_ID) NUMBER (4) 4 to_number(DEPARTMENT_ID)
  • 5. The physical/logical environment • • • Create two Operating System folders. The first for the data file and the second for the configuration file. (C:ios and c:ioscft) Create some Oracle directories needed for the external tables definition. Position the data and the configuration file in the folders. DROP DIRECTORY STA_BCK; CREATE DIRECTORY STA_BCK AS 'c:ios'; DROP DIRECTORY STA_LOG; CREATE DIRECTORY STA_LOG AS 'c:ios'; DROP DIRECTORY STA_RCV; CREATE DIRECTORY STA_RCV AS 'c:ios'; DROP DIRECTORY STA_CFT; CREATE DIRECTORY STA_CFT AS 'c:ioscft'; DROP DIRECTORY STA_CFT_LOG; CREATE DIRECTORY STA_CFT_LOG AS 'c:ioscft';
  • 6. The source configuration table • • • • • Create the configuration table of the data source showed in the slide 3 It contains the unique identificator of data source (IO_ID) It contains the folder references (*_DIR) It contains the information about the format of different types of data source Only some fields will be configured. DROP TABLE STA_IO_CFT; CREATE TABLE STA_IO_CFT ( IO_COD VARCHAR2(12), RCV_DIR VARCHAR2(30), BCK_DIR VARCHAR2(30), LOG_DIR VARCHAR2(30), HEAD_CNT NUMBER, FOO_CNT NUMBER, SEP_TXT VARCHAR2(1), IDR_NUM NUMBER, IDC_NUM NUMBER, IDS_NUM NUMBER, IDF_TXT VARCHAR2(30), EDC_NUM NUMBER, EDS_NUM NUMBER, EDF_TXT VARCHAR2(30), RCR_NUM NUMBER, RCC_NUM NUMBER, RCS_NUM NUMBER, RCF_LIKE_TXT VARCHAR2(30), FILE_LIKE_TXT VARCHAR2(60) );
  • 7. The load of configuration table • • • • • • • Load the previous table according to features of the slide 3: The folders reference (rcv_dir,bck_dir,log_dir) The name of the source file (file_like_txt) The number of header (head_cnt) and footer rows (foo_cnt). The separator character (sep_txt). Null because is not a csv file. The position, in the header, of the reference day of the source, and its format. (idr_num,idc_num,ids_num,idf_txt The offset from tail, position and size in the footer section of the number or rows of the source. (rcr_num,rcc_num,rcs_num) DELETE STA_IO_CFT WHERE IO_COD = 'employees4'; INSERT INTO STA_IO_CFT ( IO_COD ,RCV_DIR,BCK_DIR,LOG_DIR ,FILE_LIKE_TXT ,HEAD_CNT,FOO_CNT,SEP_TXT ,IDR_NUM,IDC_NUM,IDS_NUM,IDF_TXT ,RCR_NUM,RCC_NUM,RCS_NUM ) VALUES ( 'employees4' ,'STA_RCV','STA_BCK','STA_LOG' ,'employees4.txt' ,4,1,NULL ,1,9,10,'DD/MM/YYYY' ,0,19,13 );
  • 8. The configuration table of the definition file • • • Create the configuration table of the data structure showed in the slide 4 It is a metadata table You can add others info like the column description DROP TABLE STA_EMPLOYEES4_CXT; CREATE TABLE STA_EMPLOYEES4_CXT ( COLUMN_ID VARCHAR2(4), HOST_COLUMN_COD VARCHAR2(30), COLUMN_COD VARCHAR2(30), TYPE_TXT VARCHAR2(30), COLSIZE_NUM VARCHAR2(4), FXV_TXT VARCHAR2(200)) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY STA_CFT ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE BADFILE STA_CFT:'EMPLOYEES4.BAD' DISCARDFILE STA_CFT:'EMPLOYEES4.DSC' LOGFILE STA_CFT:'EMPLOYEES4.LOG' SKIP 1 FIELDS TERMINATED BY';' LRTRIM MISSING FIELD VALUES ARE NULL REJECT ROWS WITH ALL NULL FIELDS ( COLUMN_ID ,HOST_COLUMN_COD ,COLUMN_COD ,TYPE_TXT ,COLSIZE_NUM ,FXV_TXT)) LOCATION (STA_CFT:'EMPLOYEES4.CSV')) REJECT LIMIT UNLIMITED NOPARALLEL NOMONITORING;
  • 9. The structure configuration view • • Create the structure configuration view based on the previous configuration table. In addition, it only calculates the limits of the fixed columns of the data file using an analytics function. CREATE OR REPLACE VIEW STA_EMPLOYEES4_CXV AS SELECT COLUMN_ID ,HOST_COLUMN_COD ,COLUMN_COD ,TYPE_TXT ,COLSIZE_NUM ,FXV_TXT ,(SUM (COLSIZE_NUM) OVER (ORDER BY TO_NUMBER (COLUMN_ID))) - COLSIZE_NUM + 1 AS FROM_NUM ,SUM (COLSIZE_NUM) OVER (ORDER BY TO_NUMBER (COLUMN_ID)) AS TO_NUM FROM STA_EMPLOYEES4_CXT ORDER BY TO_NUMBER (COLUMN_ID);
  • 10. The source external table • • • • Create the external table linked to the source data file. The name and type of columns have to be the same of the configuration view. ROW_CNT is a useful feature of the Oracle external table to give a numbering to every row ROW_TXT is the entire row without restriction. It will be used in the following view DROP TABLE STA_EMPLOYEES4_FXT; CREATE TABLE STA_EMPLOYEES4_FXT ( EMPLOYEE_ID VARCHAR2(11) ,FIRST_NAME VARCHAR2(20) ,LAST_NAME VARCHAR2(25) ,EMAIL VARCHAR2(25) ,PHONE_NUMBER VARCHAR2(20) ,HIRE_DATE VARCHAR2(10) ,JOB_ID VARCHAR2(10) ,SALARY VARCHAR2(9) ,COMMISSION_PCT VARCHAR2(14) ,MANAGER_ID VARCHAR2(10) ,DEPARTMENT_ID VARCHAR2(13) ,ROW_CNT NUMBER ,ROW_TXT VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY STA_BCK ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE BADFILE STA_LOG:'EMPLOYEES2.BAD' DISCARDFILE STA_LOG:'EMPLOYEES2.DSC' LOGFILE STA_LOG:'EMPLOYEES2.LOG' FIELDS TERMINATED BY '' LRTRIM MISSING FIELD VALUES ARE NULL REJECT ROWS WITH ALL NULL FIELDS ( EMPLOYEE_ID POSITION(1:6) ,FIRST_NAME POSITION(7:26) ,LAST_NAME POSITION(27:51) ,EMAIL POSITION(52:76) ,PHONE_NUMBER POSITION(77:96) ,HIRE_DATE POSITION(97:106) ,JOB_ID POSITION(107:116) ,SALARY POSITION(117:125) ,COMMISSION_PCT POSITION(126:129) ,MANAGER_ID POSITION(130:135) ,DEPARTMENT_ID POSITION(136:139) ,ROW_CNT RECNUM ,ROW_TXT POSITION(1:139))) LOCATION (STA_BCK:'employees4.txt')) REJECT LIMIT UNLIMITED NOPARALLEL NOMONITORING;
  • 11. The source external view (1) • • The goal of the view is to prepare the data to load in the staging table. It will use the useful SQL clause «with» to build the information needed. See in details the single sub-query blocks. – T1 = get the name of the source data file using a table of the Oracle dictionary – T2 = get the reference day of the data using the info of the source definition table – T3 = get the declared number of rows declared in the file footer. The final -0 means that there is no offset from the tail of the file. – T4 = get the number of rows using the row counter of the external table – T5 = get the header/footer numbers of rows
  • 12. The source external view (2) • The complete SQL Statement is: CREATE OR REPLACE FORCE VIEW STA_EMPLOYEES4_FXV AS WITH T1 AS (SELECT SUBSTR(LOCATION,1,80) SOURCE_COD FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES4_FXT') ,T2 AS (SELECT TO_NUMBER(TO_CHAR(TO_DATE(SUBSTR(ROW_TXT,9,10),'dd/mm/yyyy'),'yyyymmdd')) DAY_KEY FROM STA_EMPLOYEES4_FXT WHERE ROW_CNT = 1) ,T3 AS (SELECT TO_NUMBER(SUBSTR(ROW_TXT,19,13)) ROWS_NUM FROM STA_EMPLOYEES4_FXT WHERE ROW_CNT=(SELECT MAX(ROW_CNT) FROM STA_EMPLOYEES4_FXT)-0) ,T4 AS (SELECT MAX(ROW_CNT) R FROM STA_EMPLOYEES4_FXT) ,T5 AS (SELECT HEAD_CNT X,FOO_CNT Y FROM STA_IO_CFT WHERE IO_COD = 'employees4') SELECT TO_NUMBER(EMPLOYEE_ID) EMPLOYEE_ID ,FIRST_NAME FIRST_NAME ,LAST_NAME LAST_NAME ,EMAIL EMAIL ,REPLACE(PHONE_NUMBER,'.','') PHONE_NUMBER ,TO_NUMBER(TO_CHAR(TO_DATE(HIRE_DATE,'dd/mm/yyyy'),'yyyymmdd')) HIRE_DATE ,JOB_ID JOB_ID ,TO_NUMBER(SALARY) SALARY ,TO_NUMBER(COMMISSION_PCT,'99.99') COMMISSION_PCT ,TO_NUMBER(MANAGER_ID) MANAGER_ID ,TO_NUMBER(DEPARTMENT_ID) DEPARTMENT_ID ,SOURCE_COD ,DAY_KEY ,ROWS_NUM FROM STA_EMPLOYEES4_FXT,T1,T2,T3,T4,T5 WHERE ROW_CNT > X AND ROW_CNT <= R-Y;
  • 13. The Staging table • • • The Staging table will be loaded from the previous view. It has the 3 technical fields to remember the name of the source data file, the reference day, and the rows num. The rows num can be avoided, (is the same for all records) but it can be useful for statistical checks. DROP TABLE STA_EMPLOYEES4_STT; CREATE TABLE STA_EMPLOYEES4_STT ( EMPLOYEE_ID NUMBER, FIRST_NAME VARCHAR2(20), LAST_NAME VARCHAR2(25), EMAIL VARCHAR2(25), PHONE_NUMBER VARCHAR2(20), HIRE_DATE NUMBER, JOB_ID VARCHAR2(10), SALARY NUMBER, COMMISSION_PCT NUMBER, MANAGER_ID NUMBER, DEPARTMENT_ID NUMBER, SOURCE_COD VARCHAR2(320), DAY_KEY VARCHAR2(8), ROWS_NUM NUMBER );
  • 14. The final load • We are at the end of this recipes. Now we can do the final load with a simple SQL statement INSERT INTO STA_EMPLOYEES4_STT SELECT * FROM STA_EMPLOYEES4_FXV; • I underline the following features: – All is done without ETL Tool – The only physical structure created in the DWH is the final staging table – Everything is controlled by logical structures (external tables and views) – Everything without writing any code – If you create a SQL script from this recipe, you will load the staging table with a click Email - massimo_cenci@yahoo.it Blog (italian/english) - http://massimocenci.blogspot.it/