For each calendar day, we must have clear what I expect to receive on that day and, for any given data file, what is the reference day that I expect to find inside.
There can be no doubt or ambiguity: is information that we need to know in advance and we have to configure.
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Massimo Cenci
In the loading of a Data Warehouse is important to have full control of the processing units that compose it.
Each processing unit must be carefully monitored both in the detection of errors that may occur,
both in the analysis of the execution times
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Massimo Cenci
This document discusses best practices for managing NULL values in the ETL process for a data warehouse. It recommends:
1. Do not allow NULL values in the data warehouse and replace them with default values during the staging process. This avoids issues when querying or aggregating data.
2. Simplify data types and use consistent default values like '?' for text, 0 for numbers, and 99991231 for dates to replace NULLs.
3. Exceptions to the default values can be set based on business requirements and stored in configuration tables to generate dynamic SQL.
4. Create staging tables with default values set, load data from source systems while preserving the original values, then enrich the data with default values
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
The naming convention is a key component of any IT project.
The purpose of this article is to suggest a standard for a practical and effective Data Warehouse design in Oracle environment
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
The naming convention is a key component of any IT project.
The purpose of this article is to suggest a standard for a practical and effective Data Warehouse design
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Massimo Cenci
In the loading of a Data Warehouse is important to have full control of the processing units that compose it.
Each processing unit must be carefully monitored both in the detection of errors that may occur,
both in the analysis of the execution times
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Massimo Cenci
This document discusses best practices for managing NULL values in the ETL process for a data warehouse. It recommends:
1. Do not allow NULL values in the data warehouse and replace them with default values during the staging process. This avoids issues when querying or aggregating data.
2. Simplify data types and use consistent default values like '?' for text, 0 for numbers, and 99991231 for dates to replace NULLs.
3. Exceptions to the default values can be set based on business requirements and stored in configuration tables to generate dynamic SQL.
4. Create staging tables with default values set, load data from source systems while preserving the original values, then enrich the data with default values
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
The naming convention is a key component of any IT project.
The purpose of this article is to suggest a standard for a practical and effective Data Warehouse design in Oracle environment
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
The naming convention is a key component of any IT project.
The purpose of this article is to suggest a standard for a practical and effective Data Warehouse design
Data Warehouse and Business Intelligence - Recipe 3Massimo Cenci
1) The document describes how to check that data loaded correctly from a source file to a staging table without using expensive ETL tools. It involves counting rows at each step and inserting the counts into check tables.
2) Key steps include writing a function to count rows in the source file, inserting counts of rows in the source file, external table, external view, and staging table into a detail check table.
3) A summary check is then performed by inserting the row counts from the detail table into a summary table along with a status of "OK" if all counts match or "NOT OK" otherwise.
- TRUNCATE removes all data from a table without logging the data to the redo log, while DELETE removes individual rows and logs the removal to the redo log.
- The maximum buffer size that can be specified using the DBMS_OUTPUT.ENABLE function is 1000000.
- Autonomous transactions allow commit statements to be used within database triggers.
- UTL_FILE allows PL/SQL programs to read and write operating system text files. It provides functions like FOPEN, GET_LINE, PUT, and FCLOSE.
This document provides an overview of working with the Integrated File System (IFS) on IBM i. It describes what the IFS is, how to process stream files using built-in commands or RPG APIs, and how to work with IFS directories and files using Navigator. It also lists common IFS commands and how to use them to manage directories, copy files, change permissions, and more when working with the IFS on IBM i.
This document provides an introduction to Oracle SQL. It discusses data models, Oracle architecture including memory structures, processes and files. It also covers SQL datatypes, the data definition language for creating and modifying database objects like tables. The data manipulation language for inserting, updating and deleting data is explained. Different constraint types that can be applied to columns like primary key, unique, foreign key are described.
With the introduction of SQL Server 2012 data developers have new ways to interact with their databases. This session will review the powerful new analytic windows functions, new ways to generate numeric sequences and new ways to page the results of our queries. Other features that will be discussed are improvements in error handling and new parsing and concatenating features.
What are the top 100 SQL Interview Questions and Answers in 2014? Based on the most popular SQL questions asked in interview, we've compiled a list of the 100 most popular SQL interview questions in 2014.
This pdf includes oracle sql interview questions and answers, sql query interview questions and answers, sql interview questions and answers for freshers etc and is perfect for those who're appearing for a linux interview in top IT companies like HCL, Infosys, TCS, Wipro, Tech Mahindra, Cognizant etc
This list includes SQL interview questions in the below categories:
top 100 sql interview questions and answers
top 100 java interview questions and answers
top 100 c interview questions and answers
top 50 sql interview questions and answers
top 100 interview questions and answers book
sql interview questions and answers pdf
oracle sql interview questions and answers
sql query interview questions and answers
sql interview questions and answers for freshers
SQL Queries Interview Questions and Answers
SQL Interview Questions and Answers
Top 80 + SQL Query Interview Questions and Answers
Top 20 SQL Interview Questions with Answers
Sql Server Interviews Questions and Answers
100 Mysql interview questions and answers
SQL Queries Interview Questions
SQL Query Interview Questions and Answers with Examples
Mysql interview questions and answers for freshers and experienced
SQL Loader is a utility used to load data from flat files into Oracle tables. It can load data from other databases by first converting the data to a flat file format. The document provides steps for using SQL Loader including writing a control file to describe the data file and load options, creating Oracle tables, and running SQL Loader to import the data. SQL Loader can load data into multiple tables at once using WHEN conditions and supports both conventional and direct path loading methods.
The SQL Loader utility can be used to load external data from files into Oracle tables. It uses a control file to describe the loading process. The control file specifies the data file, table, column definitions, field delimiters and other loading options. SQL Loader then loads the data according to the specifications in the control file. Logs and error files can be generated to monitor and debug the load process. Data can be loaded into single or multiple tables based on conditions specified in the control file.
The document provides steps to develop various types of procedures, reports, interfaces, and loads in Oracle Applications. It outlines the key steps as: 1) develop the code/logic; 2) move files to server; 3) create concurrent executable; 4) create concurrent program; 5) attach program to request group; and 6) submit the program. The document also summarizes how to develop inbound and outbound interfaces, and provides examples of common tables and queries.
Steps for upgrading the database to 10g release 2nesmaddy
The document provides steps for upgrading an Oracle database to version 10g Release 2. It details:
1) Running scripts that check the current database configuration and requirements for upgrade.
2) Making any necessary adjustments to parameters, tablespaces, redo logs.
3) Creating scripts to recreate database links if needing to downgrade.
4) Addressing issues with data types like timestamps with timezones and national character sets.
Dbm 438 Enthusiastic Study / snaptutorial.comStephenson23
This document contains quiz questions for DBM 438 classes covering topics like Oracle database configuration, administration, and SQL. It includes multiple choice and discussion questions testing knowledge of Oracle processes, initialization parameters, database creation, tablespaces, partitioning, and other topics. The document provides quiz questions for 4 weeks of class material.
Convert language latin1 to utf8 on mysqlVasudeva Rao
The document provides steps to convert a MySQL database from the Latin1 character set to UTF8. It outlines two methods - the first involves dumping the existing Latin1 database, modifying the dump file to change character set specifications to UTF8, and importing into a new UTF8 database. The second method validates that the existing database is already using UTF8.
MySQL Replication Evolution -- Confoo Montreal 2017Dave Stokes
MySQL Replication has evolved since the early days with simple async master/slave replication with better security, high availability, and now InnoDB Cluster
1. The document describes an application that transforms SQL queries into equivalent SPARQL queries to run on an RDF representation of a database.
2. It explains how the SQL database tables are transformed into RDF format, with each table becoming an RDF namespace and records becoming RDF triples.
3. The application works by parsing the SQL query, running it on the database, and then generating a SPARQL query with the necessary prefixes, type declarations, select fields, and filters to give the same results when run on the RDF data.
Il controllo temporale dei data file in staging areaMassimo Cenci
Dobbiamo conoscere prima le precise caratteristiche temporali del data file che dobbiamo caricare nel Data Warehouse.
Per ogni giorno solare, dobbiamo avere chiaro quali data file mi aspetto di ricevere in quel giorno e, per ogni data file, quale è il giorno di riferimento dei dati che mi aspetto di trovare al suo interno
This document discusses key concepts and steps related to implementing and customizing Oracle Applications. It describes the different environments used - development, testing, and production. It also explains concepts like profile options, organizations, forms, concurrent programs, value sets, lookups, flexfields, and tools used for installation and administration like FNDLOAD and bouncing Apache.
Data Warehouse and Business Intelligence - Recipe 3Massimo Cenci
1) The document describes how to check that data loaded correctly from a source file to a staging table without using expensive ETL tools. It involves counting rows at each step and inserting the counts into check tables.
2) Key steps include writing a function to count rows in the source file, inserting counts of rows in the source file, external table, external view, and staging table into a detail check table.
3) A summary check is then performed by inserting the row counts from the detail table into a summary table along with a status of "OK" if all counts match or "NOT OK" otherwise.
- TRUNCATE removes all data from a table without logging the data to the redo log, while DELETE removes individual rows and logs the removal to the redo log.
- The maximum buffer size that can be specified using the DBMS_OUTPUT.ENABLE function is 1000000.
- Autonomous transactions allow commit statements to be used within database triggers.
- UTL_FILE allows PL/SQL programs to read and write operating system text files. It provides functions like FOPEN, GET_LINE, PUT, and FCLOSE.
This document provides an overview of working with the Integrated File System (IFS) on IBM i. It describes what the IFS is, how to process stream files using built-in commands or RPG APIs, and how to work with IFS directories and files using Navigator. It also lists common IFS commands and how to use them to manage directories, copy files, change permissions, and more when working with the IFS on IBM i.
This document provides an introduction to Oracle SQL. It discusses data models, Oracle architecture including memory structures, processes and files. It also covers SQL datatypes, the data definition language for creating and modifying database objects like tables. The data manipulation language for inserting, updating and deleting data is explained. Different constraint types that can be applied to columns like primary key, unique, foreign key are described.
With the introduction of SQL Server 2012 data developers have new ways to interact with their databases. This session will review the powerful new analytic windows functions, new ways to generate numeric sequences and new ways to page the results of our queries. Other features that will be discussed are improvements in error handling and new parsing and concatenating features.
What are the top 100 SQL Interview Questions and Answers in 2014? Based on the most popular SQL questions asked in interview, we've compiled a list of the 100 most popular SQL interview questions in 2014.
This pdf includes oracle sql interview questions and answers, sql query interview questions and answers, sql interview questions and answers for freshers etc and is perfect for those who're appearing for a linux interview in top IT companies like HCL, Infosys, TCS, Wipro, Tech Mahindra, Cognizant etc
This list includes SQL interview questions in the below categories:
top 100 sql interview questions and answers
top 100 java interview questions and answers
top 100 c interview questions and answers
top 50 sql interview questions and answers
top 100 interview questions and answers book
sql interview questions and answers pdf
oracle sql interview questions and answers
sql query interview questions and answers
sql interview questions and answers for freshers
SQL Queries Interview Questions and Answers
SQL Interview Questions and Answers
Top 80 + SQL Query Interview Questions and Answers
Top 20 SQL Interview Questions with Answers
Sql Server Interviews Questions and Answers
100 Mysql interview questions and answers
SQL Queries Interview Questions
SQL Query Interview Questions and Answers with Examples
Mysql interview questions and answers for freshers and experienced
SQL Loader is a utility used to load data from flat files into Oracle tables. It can load data from other databases by first converting the data to a flat file format. The document provides steps for using SQL Loader including writing a control file to describe the data file and load options, creating Oracle tables, and running SQL Loader to import the data. SQL Loader can load data into multiple tables at once using WHEN conditions and supports both conventional and direct path loading methods.
The SQL Loader utility can be used to load external data from files into Oracle tables. It uses a control file to describe the loading process. The control file specifies the data file, table, column definitions, field delimiters and other loading options. SQL Loader then loads the data according to the specifications in the control file. Logs and error files can be generated to monitor and debug the load process. Data can be loaded into single or multiple tables based on conditions specified in the control file.
The document provides steps to develop various types of procedures, reports, interfaces, and loads in Oracle Applications. It outlines the key steps as: 1) develop the code/logic; 2) move files to server; 3) create concurrent executable; 4) create concurrent program; 5) attach program to request group; and 6) submit the program. The document also summarizes how to develop inbound and outbound interfaces, and provides examples of common tables and queries.
Steps for upgrading the database to 10g release 2nesmaddy
The document provides steps for upgrading an Oracle database to version 10g Release 2. It details:
1) Running scripts that check the current database configuration and requirements for upgrade.
2) Making any necessary adjustments to parameters, tablespaces, redo logs.
3) Creating scripts to recreate database links if needing to downgrade.
4) Addressing issues with data types like timestamps with timezones and national character sets.
Dbm 438 Enthusiastic Study / snaptutorial.comStephenson23
This document contains quiz questions for DBM 438 classes covering topics like Oracle database configuration, administration, and SQL. It includes multiple choice and discussion questions testing knowledge of Oracle processes, initialization parameters, database creation, tablespaces, partitioning, and other topics. The document provides quiz questions for 4 weeks of class material.
Convert language latin1 to utf8 on mysqlVasudeva Rao
The document provides steps to convert a MySQL database from the Latin1 character set to UTF8. It outlines two methods - the first involves dumping the existing Latin1 database, modifying the dump file to change character set specifications to UTF8, and importing into a new UTF8 database. The second method validates that the existing database is already using UTF8.
MySQL Replication Evolution -- Confoo Montreal 2017Dave Stokes
MySQL Replication has evolved since the early days with simple async master/slave replication with better security, high availability, and now InnoDB Cluster
1. The document describes an application that transforms SQL queries into equivalent SPARQL queries to run on an RDF representation of a database.
2. It explains how the SQL database tables are transformed into RDF format, with each table becoming an RDF namespace and records becoming RDF triples.
3. The application works by parsing the SQL query, running it on the database, and then generating a SPARQL query with the necessary prefixes, type declarations, select fields, and filters to give the same results when run on the RDF data.
Il controllo temporale dei data file in staging areaMassimo Cenci
Dobbiamo conoscere prima le precise caratteristiche temporali del data file che dobbiamo caricare nel Data Warehouse.
Per ogni giorno solare, dobbiamo avere chiaro quali data file mi aspetto di ricevere in quel giorno e, per ogni data file, quale è il giorno di riferimento dei dati che mi aspetto di trovare al suo interno
This document discusses key concepts and steps related to implementing and customizing Oracle Applications. It describes the different environments used - development, testing, and production. It also explains concepts like profile options, organizations, forms, concurrent programs, value sets, lookups, flexfields, and tools used for installation and administration like FNDLOAD and bouncing Apache.
Design Principles for a Modern Data WarehouseRob Winters
This document discusses design principles for a modern data warehouse based on case studies from de Bijenkorf and Travelbird. It advocates for a scalable cloud-based architecture using a bus, lambda architecture to process both real-time and batch data, a federated data model to handle structured and unstructured data, massively parallel processing databases, an agile data model like Data Vault, code automation, and using ELT rather than ETL. Specific technologies used by de Bijenkorf include AWS services, Snowplow, Rundeck, Jenkins, Pentaho, Vertica, Tableau, and automated Data Vault loading. Travelbird additionally uses Hadoop for initial data processing before loading into Redshift
Best practices and tips on how to design and develop a Data Warehouse using Microsoft SQL Server BI products.
This presentation describes the inception and full lifecycle of the Carl Zeiss Vision corporate enterprise data warehouse.
Technologies covered include:
•Using SQL Server 2008 as your data warehouse DB
•SSIS as your ETL Tool
•SSAS as your data cube Tool
You will Learn:
•How to Architect a data warehouse system from End-to-End
•Components of the data warehouse and functionality
•How to Profile data and understand your source systems
•Whether to ODS or not to ODS (Determining if a operational Data Store is required)
•The staging area of the data warehouse
•How to Build the data warehouse – Designing Dimensions and Fact tables
•The Importance of using Conformed Dimensions
•ETL – Moving data through your data warehouse system
•Data Cubes - OLAP
•Lessons learned from Zeiss and other projects
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
This document outlines the ETL process for a data warehouse. It discusses extracting data from both formal systems like operational databases and informal sources like Excel files. The data is extracted, transformed by integrating data, type conversions, and data cleansing, and loaded into staging areas and the data warehouse. Key aspects of the transformation include creating surrogate keys, referential integrity checks, normalization and de-normalization, mapping codes to descriptions, and building aggregate tables. The loading process is described as weekly or monthly refreshes to support sales and purchasing decisions.
This document outlines the database design process for a construction company. It includes stages for initial study, design, implementation, testing and evaluation. The design involves normalization up to 3NF. Entity relationship diagrams and data dictionaries are created for tables including employees, banks, departments, finances and attendance. Sample queries and reports are provided to extract and analyze data from the database tables. Forms are also created for data entry and viewing. The overall goal is to convert the company's manual systems to a computerized database to improve data access, reporting and analysis.
data warehousing need and characteristics. types of data w data warehouse arc...aasifkuchey85
The document discusses data warehouses and database management systems (DBMS). It provides information on:
- The key difference between online analytical processing (OLAP) and online transaction processing (OLTP) databases and their purposes. OLAP databases contain historical data for analysis while OLTP databases contain current operational data.
- The top-down and bottom-up approaches for constructing a data warehouse, which involve extracting data from external sources, transforming and loading it, and then storing it in data marts or a centralized data warehouse.
- Some common components of a data warehouse architecture including the external sources, staging area, data warehouse, data marts, and data mining.
- Properties and features of a DB
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Massimo Cenci
In a presentation of some time ago, I had highlighted the importance of the control
between the reference day and the expected one of a data file received from the host system.
Given the importance of the topic,
I would further deepen both the meaning of the control and the identification of the reference day of the data.
This argument may seem strange, or too technical, mainly for those who are
not very experienced about Data Warehouse but this is a very useful example to understand the
pitfalls inside the ETL process.
The document discusses database design and the design process. It explains that database design involves determining the logical structure of tables and relationships between data elements. The design process consists of steps like determining relationships between data, dividing information into tables, specifying primary keys, and applying normalization rules. The document also covers entity-relationship diagrams and designing inputs and outputs, including input controls and designing report formats.
- A data warehouse is a central repository for an organization's historical data that is used to support management reporting and decision making. It contains data from multiple sources integrated into a consistent structure.
- Data warehouses are optimized for querying and analysis rather than transactions. They use a dimensional model and denormalized structures to improve query performance for business users.
- There are two main approaches to data warehouse design - the dimensional model advocated by Kimball and the normalized model advocated by Inmon. Both have advantages and disadvantages for query performance and ease of use.
Replace this Line with the Title of Your Paper.docxdebishakespeare
Replace this Line with the Title of Your Paper
Your Name Goes Here
American Public University System
System Design Specification
Table of Contents
1. Management summary
a. Summary of requirements
b. Development to date
c. Provides a current status report
d. Summarizes project costs and benefits
e. Implementation schedule highlights
f. Any issues that management will need to address
2. System Components
a. System
3. System Environment
a. Constraints
b. Requirements
c. Hardware (Storage, Input / Output Devises)
d. Systems software
e. Security
4. Implementation requirements
a. Specify start-up processing
b. Initial data entry or acquisition
c. User training requirements
d. Software test plans
5. Time and cost estimates
a. Detailed schedules
b. Cost estimates,
c. Staffing requirements
d. Total costs-to-date
6. Additional material
a. Other material
1. Management Summary
2. System Components
3. System Environment
4. Implementation Requirements
5. Time and cost estimates
6. Additional Material
System Design Specification Guide
The system design specification presents the complete system design for an information system and is the basis for the presentations that complete the systems design phase. Following the presentations, the project either progresses to the systems development phase, requires additional systems design work, or is terminated.
System Design Specification
Table of Contents
1. Management summary (20 pts.)
a. Summary of requirements
b. Development to date
c. Provides a current status report
d. Summarizes project costs and benefits
e. Implementation schedule highlights
f. Any issues that management will need to address
2. System Components (40 pts.)
a. System: This section contains the complete design for the new system, including the user interface, outputs, inputs, files, databases, and network specifications. You should include source documents, report and screen layouts, DFDs, and all other relevant documentation. You also should include the requirements for all support processing, such as backup and recovery, start-up processing, and file retention. If the purchase of a software package is part of the strategy, you must include any interface information required between the package and the system you are developing. If you use a CASE design tool, you can print design diagrams and most other documentation directly from the tool.
3. System Environment (15 pts.)
a. Constraints, or conditions, affecting the system. Examples of operational constraints include transaction volumes that must be supported, data storage requirements, processing schedules, reporting deadlines, and online response times.
b. Requirements that involve operations
c. Hardware (Storage, Input / Output Devices)
d. Systems software, (Interface, Communications or data exchange)
e. Security
4. Implementation requirements (10 pts.)
a. Specify start-up processing
b. Initial data entry or acquisition
c. User training requirements
d. Software ...
CIS 336 STUDY Introduction Education--cis336study.comclaric262
FOR MORE CLASSES VISIT
www.cis336study.com
Question 1. 1. (TCO 1) A DBMS performs several important functions that guarantee the integrity and consistency of the data in the database. Which of the following is NOT one of those functions?
Question 2. 2. (TCO 1) A relational DBMS provides protection of the _____ through security, control, and recovery facilities.
Question 3. 3. (TCO 2) A relationship is an association between _____
Page 18Goal Implement a complete search engine. Milestones.docxsmile790243
Page 1/8
Goal: Implement a complete search engine. Milestones Overview
Milestone Goal #1 Produce an initial index for the corpus and a basic retrieval component
#2 Complete Search System
Page 2/8
PROJECT: SEARCH ENGINE Corpus: all ICS web pages We will provide you with the crawled data as a zip file (webpages_raw.zip). This contains the downloaded content of the ICS web pages that were crawled by a previous quarter. You are expected to build your search engine index off of this data. Main challenges: Full HTML parsing, File/DB handling, handling user input (either using command line or desktop GUI application or web interface) COMPONENT 1 - INDEX: Create an inverted index for all the corpus given to you. You can either use a database to store your index (MongoDB, Redis, memcached are some examples) or you can store the index in a file. You are free to choose an approach here. The index should store more than just a simple list of documents where the token occurs. At the very least, your index should store the TF-IDF of every term/document. Sample Index:
Note: This is a simplistic example provided for your understanding. Please do not consider this as the expected index format. A good inverted index will store more information than this. Index Structure: token – docId1, tf-idf1 ; docId2, tf-idf2
Example: informatics – doc_1, 5 ; doc_2, 10 ; doc_3, 7 You are encouraged to come up with heuristics that make sense and will help in retrieving relevant search results. For e.g. - words in bold and in heading (h1, h2, h3) could be treated as more important than the other words. These are useful metadata that could be added to your inverted index data. Optional (1 point for each meta data item up to 2 points max):: Extra credit will be given for ideas that improve the quality of the retrieval, so you may add more metadata to your index, if you think it will help improve the quality of the retrieval. For this, instead of storing a simple TF-IDF count for every page, you can store more information related to the page (e.g. position of the words in the page). To store this information, you need to design your index in such a way that it can store and retrieve all this metadata efficiently. Your index lookup during search should not be horribly slow, so pay attention to the structure of your index COMPONENT 2 – SEARCH AND RETRIEVE: Your program should prompt the user for a query. This doesn’t need to be a Web interface, it can be a console prompt. At the time of the query, your program will look up your index, perform some calculations (see ranking below) and give out the ranked list of pages that are relevant for the query.
COMPONENT 3 - RANKING:
At the very least, your ranking formula should include tf-idf scoring, but you should feel free to add additional components to this formula if you think they improve the retrieval. Optional (1 point for each parameter up to 2 points max): Extra credit will be given if your ranking formula includes par.
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
A database management system (DBMS) is a software system that allows users to define, create, maintain and control access to a database. A DBMS organizes data into tables with rows and columns to allow for storage, retrieval, deletion and updating of data. Some examples of DBMS include MySQL, Microsoft Access, SQL Server and Oracle. The ultimate purpose of a DBMS is to transform data into information and knowledge to enable action.
This document discusses tools and techniques for system design, including logical data flow diagrams (DFDs) and data dictionaries. DFDs use simple graphical symbols to represent processes, data flows, external entities, and data stores in a system. They provide an overview of how data moves through a system. A data dictionary comprehensively defines all data elements in a system. Physical design and prototyping are also discussed as part of specifying hardware, software, user interfaces, and the overall implementation of a system.
Data Warehouse - What you know about etl process is wrongMassimo Cenci
The document discusses redefining the typical ETL process. It argues that the traditional understanding of ETL, consisting of extraction, transformation, and loading, is misleading and does not accurately describe the workflow. Specifically, it notes that:
1) The extraction step is usually handled by external source systems, not the data warehouse team.
2) There is a missing configuration and data acquisition step before loading.
3) Transformation is better thought of as data enrichment rather than transformation.
4) The loading phase is unclear about where the data should be loaded.
It proposes redefining the process as configuration, acquisition, loading (to a staging area), enrichment, and final loading to the data warehouse.
This document introduces key concepts related to data structures and algorithms. It defines objectives like introducing commonly used data structures and selecting the best one for a given problem. It describes how abstraction is used to model problems and define abstract data types independently of programming languages. Data structures provide a physical implementation of abstract data types by organizing data in memory. Algorithms manipulate data structures to transform their state and produce outputs. Properties like finiteness, definiteness, correctness, and efficiency are discussed for algorithms. Measuring an algorithm's theoretical efficiency using asymptotic analysis is introduced.
The document discusses database design and related concepts. It begins by defining database design as the logical design of data structures used to store data, such as tables and views in a relational database. It also notes database design involves determining relationships between data elements and structuring data based on these relationships. The document then discusses entity-relationship (ER) diagrams as a tool for efficiently designing databases and includes an example ER diagram. It also outlines the typical steps in the database design process, including determining data types and relationships, structuring data into tables, and applying normalization rules.
This document discusses techniques for optimizing the performance of PeopleSoft applications. It covers tuning several aspects within a PeopleSoft environment, including server performance, web server performance, Tuxedo performance management, application performance, and database performance. Some key recommendations include implementing a methodology to monitor resource consumption without utilizing critical resources, ensuring load balancing strategies are sound, measuring historical patterns of server resource utilization, capturing key performance metrics for Tuxedo, and focusing on tuning high-resource consuming SQL statements and indexes.
Understanding EDP (Electronic Data Processing) EnvironmentAdetula Bunmi
The document discusses key concepts related to electronic data processing (EDP) environments. It describes the organizational structure of an EDP environment and defines computer files and their elements. It explains different types of files like text, image, audio and video files. It also discusses various file organization methods like sequential, indexed-sequential and direct access. The document describes storage media devices, data processing activities, and vulnerabilities of files from improper input and software abuse.
The document discusses data warehousing concepts including:
1) A data warehouse is a subject-oriented, integrated, and non-volatile collection of data used for decision making. It stores historical and current data from multiple sources.
2) The architecture of a data warehouse is typically three-tiered, with an operational data tier, data warehouse/data mart tier for storage, and client access tier. OLAP servers allow analysis of stored data.
3) ROLAP and MOLAP refer to relational and multidimensional approaches for OLAP. ROLAP dynamically generates data cubes from relational databases, while MOLAP pre-calculates and stores aggregated data in multidimensional structures.
Similar to Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse (20)
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Massimo Cenci
The descriptions management, is a subject little discussed within the Data Warehouse and Business Intelligence community. It speaks little in books and articles, although it is a crucial issue because it affects the way we see the information.
So, we'll talk about descriptions. In particular, the descriptions of the codes that we can found in the dimension tables of a Data Warehouse.
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Massimo Cenci
La naming convention è una componente fondamentale di ogni progetto informatico.
L’obiettivo di questo articolo è quello di suggerire uno standard di nomenclatura pratico ed efficace per un progetto di Data Warehouse.
(parte 2)
The letter is from a program expressing sadness at being separated from its original programmer. It describes being carefully crafted by its programmer with passion and pride. However, one day a new stranger took over and began tearing it apart and mutilating its code, causing it to slow down and suffer. It now sees only anonymous cloned programs around it that run slowly. It knows its original programmer still dreams of it but feels without his love and care, it can no longer continue and it is time to die.
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Massimo Cenci
La naming convention è una componente fondamentale di ogni progetto informatico.
L’obiettivo di questo articolo è quello di suggerire uno standard di nomenclatura
pratico ed efficace per un progetto di Data Warehouse.
(parte 1)
Oracle All-in-One - how to send mail with attach using oracle pl/sqlMassimo Cenci
Oracle All-in-One (All you need in only one slide)
OAiO1 - How to send mail with attach using Oracle plsql
The goal is to have in a single image all the objects involved
in a common need of a Data Warehouse in Oracle environment.
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Massimo Cenci
Le dimensioni di analisi sono le componenti fondamentali per definire gli spazi analitici all’interno di un Data Warehouse.
Analizziamo in dettaglio il loro design e la loro implementazione (parte 2)
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisiMassimo Cenci
Le dimensioni di analisi sono le componenti fondamentali per definire gli spazi analitici all’interno di un Data Warehouse. L’obiettivo è quello di analizzare in dettaglio il loro design e la loro implementazione
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse
1. How to have the monitoring of the days on the
data files of a Data Warehouse
Recipes of Data Warehouse and Business Intelligence
Are you the right one ? Have you what I
expect ?
Have you lossed
some piece ?
DATA
FILE
2. • In this article we focus on the management of the loading day of the data file, the
reference day of the data, and the expected number of rows. These issues have
already been covered briefly in some of my previous articles published on
slideshare and on my blog. Now we see the practical application.
• How real case, we will use, as an example, the data file of MTF markets
(Multilateral Trading Facilities). To the data file has been associated a "row" file that
contains, within it, the number of rows expected in the data file itself.
• The control file, created by hand to this end, is composed of three lines:
#MTF CONTROL FILE OF 20160314
ROWS = 160
#END OF MTF CONTROL FILE OF 20160314
• We suppose that the data file should arrive every working day, and the reference
day is the previous working day.
• The reference day is specified in the file name, but we must be careful, because the
feeding system sets, as reference, the day of production of the data file and not the
previous working day.
The use case
3. • Based on the information mentioned above, to
get the full control of the data file loading, the
ETL system should provide me all the
information necessary to fulfill the following
requirements.
• We must have a clear vision of what are the
characteristics of the data file, both general
and purely technical nature. In particular,
those linked to its name, the file structure, the
way it is defined the reference day, the
structure of the control file (if present)
• So, we will define the temporal characteristics
of the data file by using a code that represents
its management.
The control requirements
4. • For convenience, I summarize the ways in which the feeding system can tell me the
reference day.
The control requirements
A column of
data file
Inside the
data file
Where is the
reference
day of data ?
In the heading
of data file
In the tail of
data file
In the name
of data file
Missing, assume
the system date
Outside the
data file
5. • We must have a clear vision of what is the internal structure of the data file, ie
what are the columns that constitute it. And for each column must be present as
many as possible metadata.
• Both static, such as the type or length, that dynamic, as the presence of a domain
of values, or if the column is part of the unique key.
The control requirements
6. The control requirements
• We must have a calendar table, that,
for each calendar day, tell me, simply
duplicating the day, if I expect the
arrival of the data file and what is the
expected reference day in the data file
of that day.
• If the data file contains more days, I
need to know what is the range of days
that I expect.
7. The control requirements
• We need to know the final outcome of the processing. The final state and the time
taken. If the upload has had problems, I need to know the error produced, and what
is the programming module that generated it.
• If the outcome is negative, we have to know exactly why you are in error. For
example, if the consistency check has failed, I need to know at what point it
occurred.
8. The control requirements
• We need to know the final outcome of the control about the loading day and the
reference day.
• To get the final outcome of the controls, we have to think about implementing a
control logic similar to that shown in the next figure.
• Dark green definitely the correct situations. In red, the alert situations. In light green,
the ones presumably correct but that require attention.
9. The control requirements
1 – OK
(arrived and right day)
Expected day = reference
day ?
It had
to arrive ?
Data file
is arrived ?
2 - NOT OK
( arrived but wrong day)
3 - OK
(unespected file)
4 - NOT OK
(unespected file and
wrong day)
5 - OK
(maybe file)
6 - NOT OK
(maybe file and wrong
day)
7 - NOT OK
(missing file)
8 – OK
(no file to load)
9 - OK
(maybe file)
Expected day = reference
day ?
Expected day = reference
day ?
It had
to arrive ?
yes
no
maybe
yes
no
maybe
yes
yes
yes
yes
no
no
no
no
10. The control requirements
• We must have via e-mail the result of processing.
• Using the Micro ETL Foundation we can handle this situation and its control in a few
steps.
MEF:
Open the link:
https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k
Go to the Mef_v2 folder and follow the instructions of the readme file.
The data file is in the folder .. dat and is called mtf_export_20160314.csv. The control file with the expected number of rows is
called mtf_export_20160314.row.
It is present in the .. dat
The file that configures the data file fields is located in the .. cft and is called mtf.csv
11. The configuration of the data and control file
• The first step is to insert into a configuration table, which we will call IO_CFT for
brevity, all the information that we know about the features of the data file that we
load. Also, for this case, you need to enter in the IO_CFT table also information
relating to the control file.
• The second step is to insert in the IO_CFT table, the information relative to the
expected day of arrival of the data file. We must define a code, let's call FR_COD (File
Reference Code) behind which there will be the load logic of a second configuration
table that we will call IODAY_CFT. The FR_COD code represents the arrival frequency.
For the moment, I have defined some commonly used values :
• AD = Every day. It means that the data file must arrive every day. So, in
IODAY_CFT table, they will be setted all the days.
• AWD = All working days. It means that the data file must only arrive on the
working days. So all holidays most Saturdays and Sundays will be null.
• ? = I do not know when it comes, it is variable. Typical of monthly flows of which
no one knows precisely when available.
• Based on the FR_COD code, the IODAY_CFT table will be loaded, by setting the
presence of the expected day in the FR_YMD field.
12. Reference day configuration
• The third step is to insert in the IO_CFT table, information relating to the expected
reference day.
• The DR_COD code must indicate what should be the reference day for data in the
data file. I remember that the reference day must be present or implied. The same
logic has been applied to FR_COD field also applies to DR_COD field. It will serve to
set the IODAY_CFT. For the moment I have defined some commonly used values:
• 0 = the reference date coincides with the current day.
• 1 = the reference date coincides with the day before, that is, the current -1
• 1W = indicates the first preceding business day.
• The configuration tasks of the IODAY_CFT table occurs only once in the process of
the data file configuration. After, you no longer need to change.
• Note that the use of the codes is a way to quickly facilitate the setting of the
IODAY_CFT table. Nobody blocks you, to manually modify the table or with ad-hoc
SQL.
13. Configuration of the correction factor
• The OFF_COD code present in IO_CFT indicates the correction factor to be applied to
the reference day indicated by the feeding system. The OFF_COD does not act in
control, but will act as a corrector of the day at run-time. For the moment I have
defined some commonly used codes:
• 0 = the reference day coincides with the day indicated by the feeding system.
• 1 = the reference day coincides with the day before, that is, the current -1
• 1W = the reference date coincides with the previous working day.
• The FROM_DR_YMD and TO_DR_YMD fields have the same meaning of the FR_COD
field, but allow you to identify a range of possible reference days. For the moment,
only one code has been defined
• PM = the previous month of the current calendar day.
MEF:
The data file is in the folder .. dat and is called mtf_export_20160314.csv.
The control file with the expected number of rows is called mtf_export_20160314.row. It is present in the .. dat
The file that configures the file data field structure is located in the .. cft and is called mtf.csv
The configuration file of the data file is called io_mtf.txt and is under the folder .. cft. It has the following settings:
14. The configuration file
IO_COD: MTF (file identificator)
IO_DEB: Multilateral Trading Facilities (file description)
TYPE_COD: FIN (file type - input file)
SEC_COD: ESM (feeding system: ESMA)
FRQ_COD: D (frequency - Daily)
FILE_LIKE_TXT: mtf_export% .csv (generic name of the file without day)
FILE_EXT_TXT: mtf_export_20160314.csv (name of the sample data file)
HOST_NC:., (Priority on the decimal point)
HEAD_CNT: 1 (number of rows in header)
FOO_CNT: 0 (number of rows in tail)
SEP_TXT :, (separator symbol if csv)
START_NUM: 12 (starting character of the day in the name)
SIZE_NUM: 8 (size of day)
RROW_NUM: 2 (row of the control file in which there is the file rows number)
RSTART_NUM: 8 (where begins the number of rows)
RSIZE_NUM: 6 (size of the number)
MASK_TXT: YYYYMMDD (format of the day)
FR_COD: AWD (file reference code)
DR_COD: 1W (day reference code)
OFF_COD: 1W (offset on day reference)
RCF_LIKE_TXT: mtf_export% .row (generic name of control file without day)
RCF_EXT_TXT: mtf_export_20160314.row (name of the sample control file)
FTB_TXT: NEWLINE (indicator of the row end for the Oracle external table)
TRUNC_COD: 1 (indicating whether the staging table should be truncated before loading)
NOTE_IO_COD: MTF (presence of a notes file)
15. The configuration file
MEF:
The DR_COD code is managed by the mef_sta_build.p_dr_cod function
The FR_COD code is managed by the mef_sta_build.p_fr_cod function
The OFF_COD code is managed by mef_sta.f_off_cod function. See further detail in Recipe 12 on Slideshare
The functions that handle the day range are mef_sta_build.p_from_dr_cod and mef_sta_build.p_to_dr_cod.
In this way, by changing the functions we can define other codes. The mef_sta_build.p_objday_cft will load the IODAY_CFT table.
The complete configuration of the data file is done by launching the procedure
SQL> @sta_conf_io MTF
16. The data file loading
• The process of loading of the data file, must insert in a log table the information
related to the elaboration day and to the reference day received from the feeding
system.
MEF:
SQL> exec mef_job.p_run('sta_esm_mtf');
• Comparing, at the end of loading, what is configured with what is loaded, we can
infer a final outcome of the process. This comparison may be displayed by means of
a view which we will call IODAY_CFV.
• The logic with which works the view was summarized in a previous figure. On the
basis of this outcome, it must be agreed upon an intervention strategy.
• In our example, launched on a working day, we see that there is a problem related to
the reference day.
• Also there is another problem to be investigated: the number of rows declared in the
control file is different from the number of rows loaded.
17. Conclusion
• Whatever way we implement an ETL solution, the important point to emphasize is
that we need to know before, the time characteristics of the data file that we will
load.
• For each calendar day, we must have clear what I expect to receive on that day and,
for any given data file, what is the reference day that I expect to find inside.
• There can be no doubt or ambiguity: is information that we need to know in advance
and we have to configure. After the loading of the Staging Area, only the comparison
between what we expected to receive with what we actually received, will allow us
to evaluate the correctness of the loaded data.
• It ' just remember that this correctness check is a priority, is the first check, and it
refers only to the two time components of the data. Only if these checks are positive,
it will make sense to continue with the other quality controls.
18. References
On Slideshare:
the series: Recipes of Data Warehouse and Business Intelligence.
Blog:
http://microetlfoundation.blogspot.it
http://massimocenci.blogspot.it/
Micro ETL Foundation free source at:
https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k
Last version v2.
Email:
massimo_cenci@yahoo.it