SlideShare a Scribd company logo
1 of 8
How to monitor and analyze a data fileHow to monitor and analyze a data file
before loading in Staging Areabefore loading in Staging Area
Recipes of Data Warehouse and Business IntelligenceRecipes of Data Warehouse and Business Intelligence
• In this use case, that is part of the Micro ETL Foundation (MEF), we try to
handle a preliminary analysis of the data files.
• Configure and load the first test data file, is a activities which can be very
time consuming. Very often the data file is full of special characters and
other abnormalities that prevent its loading.
• Sometimes the data file is loaded correctly in the Staging Area, but the
presence of "dirty" and/or special characters creates problems later to the
Business Intelligence interfaces or to the HTML reports.
• Then, I will provide you a tool that can help identify problems immediately
about data files. This will allow us to alert the feeding system indicating
very precisely where are the anomalies.
Analysis of the use case
• There are no changes to the repository MEF. Download from
(https://drive.google.com/open?id=0B2dQ0EtjqAOTZk1Eb3J0UEVsMnc)
The installation is very simple and it will act as a plug-in to MEF. They will
only be added new structures.
• If you haven’t MEF, take the latest version under
(https://drive.google.com/open?id=0B2dQ0EtjqAOTNjZlUFR0NkIyQm8)
and follow the readme file for the installation.
• Very useful is the new MEF_ASCII_CFT. This table was loaded with the 256
encodings ascii with descriptions. We see a fragment in the next figure.
• The table MEF_ANAFILE_LOT will contain the result of the analysis of the
data file with only the lines that contained abnormal situations, on the
"dirty" characters or on incorrect number of column separators (only for
data file of csv type).
Changes to MEF repository
Changes to MEF repository
To test, simply run the procedure mef_dfana.p_analyze. Care must be taken
in the input parameters:
p_io_cod – data file code - This parameter does not affect the procedure. It
can be useful for a link to the configuration table of the data file.
p_dir - file folder - Oracle directory for the path of the data file. For example
DWH_DAT.
p_file - file name - File name. We can try a data file of a previous use case, for
example, regmar_20160205.csv
p_t1 - decimal format of the 1st terminator - It may happen that to indicate
the end of row are used more characters. Indicate the decimal code of the
first character. For example, specify 10 for the carriage return <CR>
p_t2 - decimal format of the 2nd terminator if exists - Indicate the decimal
code of the second character. For example, specify 13 for the line feed <LF>
p_t3 - decimal format of the 3rd terminator if exists - Indicate the decimal
code of the third character if it exists
p_sep - decimal format of the columns separator - Indicate the decimal code
of the field separator if it exists.(59=";",44=",")
Execute the use case
p_sep_cnt - counter of separators - Indicate the number of the expected field
separators. Generally it coincides with the number of fields, but sometimes
the last field is still followed by a separator. If you are unsure, it leaves null
this parameter: the procedure will use the number of separators
of the first row.
p_from - analyze row from - Number of the line from which the analysis of
the data file will start. The default is 1.
p_to - analyze row to - Number of the last line to be analyzed. If null, it means
the entire file. Could be useful, for very large data file, first try on a reduced
number of rows.
To control the end-of-row characters, I suggest to open the data file by an
editor with the exadecimal/decimal visualization. On open, don't convert to
others format. Usually they are 10 <LF> or 13+10 <CR><LF>. To test, We can
use the data files of the use cases.
SQL> exec mef_dfana.p_analyze('TEST','DWH_DAT','regmar_20160205.csv',10,null,null,44,null);
Execute the use case
The message log table of MEF:
Execute the use case
The result on the view MEF_ANAFILE_LOV:
References
http://www.slideshare.net/jackbim/recipe-14-of-data-warehouse-and-business-intelligence-build-a-staging-area-for-an-oracle-data-
warehouse-1
http://massimocenci.blogspot.it/
Email: Massimo_cenci@yahoo.it

More Related Content

More from Massimo Cenci

Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
Massimo Cenci
 

More from Massimo Cenci (20)

Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
 
Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging area
 
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
 
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etl
 
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmer
 
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
 
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
 
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 

Recipe 15 of Data Warehouse and Business Intelligence - How to monitor and analyze a data file before loading in Staging Area with Oracle

  • 1. How to monitor and analyze a data fileHow to monitor and analyze a data file before loading in Staging Areabefore loading in Staging Area Recipes of Data Warehouse and Business IntelligenceRecipes of Data Warehouse and Business Intelligence
  • 2. • In this use case, that is part of the Micro ETL Foundation (MEF), we try to handle a preliminary analysis of the data files. • Configure and load the first test data file, is a activities which can be very time consuming. Very often the data file is full of special characters and other abnormalities that prevent its loading. • Sometimes the data file is loaded correctly in the Staging Area, but the presence of "dirty" and/or special characters creates problems later to the Business Intelligence interfaces or to the HTML reports. • Then, I will provide you a tool that can help identify problems immediately about data files. This will allow us to alert the feeding system indicating very precisely where are the anomalies. Analysis of the use case
  • 3. • There are no changes to the repository MEF. Download from (https://drive.google.com/open?id=0B2dQ0EtjqAOTZk1Eb3J0UEVsMnc) The installation is very simple and it will act as a plug-in to MEF. They will only be added new structures. • If you haven’t MEF, take the latest version under (https://drive.google.com/open?id=0B2dQ0EtjqAOTNjZlUFR0NkIyQm8) and follow the readme file for the installation. • Very useful is the new MEF_ASCII_CFT. This table was loaded with the 256 encodings ascii with descriptions. We see a fragment in the next figure. • The table MEF_ANAFILE_LOT will contain the result of the analysis of the data file with only the lines that contained abnormal situations, on the "dirty" characters or on incorrect number of column separators (only for data file of csv type). Changes to MEF repository
  • 4. Changes to MEF repository
  • 5. To test, simply run the procedure mef_dfana.p_analyze. Care must be taken in the input parameters: p_io_cod – data file code - This parameter does not affect the procedure. It can be useful for a link to the configuration table of the data file. p_dir - file folder - Oracle directory for the path of the data file. For example DWH_DAT. p_file - file name - File name. We can try a data file of a previous use case, for example, regmar_20160205.csv p_t1 - decimal format of the 1st terminator - It may happen that to indicate the end of row are used more characters. Indicate the decimal code of the first character. For example, specify 10 for the carriage return <CR> p_t2 - decimal format of the 2nd terminator if exists - Indicate the decimal code of the second character. For example, specify 13 for the line feed <LF> p_t3 - decimal format of the 3rd terminator if exists - Indicate the decimal code of the third character if it exists p_sep - decimal format of the columns separator - Indicate the decimal code of the field separator if it exists.(59=";",44=",") Execute the use case
  • 6. p_sep_cnt - counter of separators - Indicate the number of the expected field separators. Generally it coincides with the number of fields, but sometimes the last field is still followed by a separator. If you are unsure, it leaves null this parameter: the procedure will use the number of separators of the first row. p_from - analyze row from - Number of the line from which the analysis of the data file will start. The default is 1. p_to - analyze row to - Number of the last line to be analyzed. If null, it means the entire file. Could be useful, for very large data file, first try on a reduced number of rows. To control the end-of-row characters, I suggest to open the data file by an editor with the exadecimal/decimal visualization. On open, don't convert to others format. Usually they are 10 <LF> or 13+10 <CR><LF>. To test, We can use the data files of the use cases. SQL> exec mef_dfana.p_analyze('TEST','DWH_DAT','regmar_20160205.csv',10,null,null,44,null); Execute the use case
  • 7. The message log table of MEF: Execute the use case The result on the view MEF_ANAFILE_LOV: