BI Error Processing Framework

Target Corporation

BI Framework
Error Processing
Mohan.Kumar2

Table of Contents

1. Exception Handling Overview (ref 2.5.2) ....................................................................................... 3
1.1. Data Reprocessing......................................................................................................................... 5
1.2. Infrastructure Exception Handling ................................................................................................ 7
1.3. Data Correction in DWH................................................................................................................ 9
2. Error Processing – High Level .............................................................................................................. 11
2.1. Capturing..................................................................................................................................... 11
2.2. Error threshold ............................................................................................................................ 11
2.3. Purging ........................................................................................................................................ 12
2.3.1. Landing Area ....................................................................................................................... 12
2.3.2. Staging Area ........................................................................................................................ 12
2.3.3. EDW..................................................................................................................................... 12
2.3.4. Datamart ............................................................................................................................. 12
2.4. Purge threshold........................................................................................................................... 12
2.5. Appendix ..................................................................................................................................... 12
2.5.1. About Target ....................................................................................................................... 12
2.5.2. Reference ............................................................................................................................ 13
2.5.3. Other Contributors.............................................................................................................. 13

Page 2 of 13

1. Exception Handling Overview (ref 2.5.2)

Exception Handling deals with any abnormal termination, unacceptable event or incorrect data that
can impact the data flow or accuracy of data in the warehouse/mart.

Exceptions in ETL could be classified as Data Related Exceptions and Infrastructure Related
Exceptions.

Please Note: In Infrastructure Related exception, Infrastructure glitches are not classified as exception
as they are temporary and are resolved by the time the job(s) is/are rerun. But, the logs are tracked
and maintained.

The process of recovering or gracefully exiting when an exception occurs is called exception handling.

Page 3 of 13

Data related exceptions are caused because of incorrect data format, incorrect value, incomplete
data from the source system. This leads to Data validation exceptions and Data Rejects. The process of
handling the Data Rejects is called Data Reprocessing.

Page 4 of 13

Infrastructure related exceptions are caused because of issues in the Network , the Database and the
Operating System. Common Infrastructure exceptions are FTP failure, Database connectivity failure,
File system full etc.

The data related exceptions are usually documented in the requirements, if not they must be because
if the data related exceptions are not handled they lead to inaccurate data in the warehouse/mart. We
also keep a threshold of maximum number of validation or reject failures allowed per load. Any value
above the threshold would mean the data would be too inaccurate because to too many rejections.

There is one more exception which is the presence of inaccurate or incorrect data in the warehouse.
This could happen due to

1. Incorrect requirement or missed, leading to incorrect ETL.
2. Incorrect interpretation of requirements leading to incorrect ETL.
3. Uncaught coding defects.
4. Incorrect data from source.

The process of Correction of the data already loaded in the warehouse involves fixing the data already
loaded and also preventing the inaccuracy to persist in the future.

1.1. Data Reprocessing

Reprocessing is is an exception handling process which involves the correction of the data that is could
not be loaded into the warehouse/mart.

There could be many reasons why source data
gets rejected from DWH. Most common of
them are

Data Rejection - Source data not
matching critical business codes/attributes.
This is called Lookup Failure in ETL.
Data Cleansing - Source data
containing junk values for business critical
fields hence getting rejected during data
validation.

There are 3 options to deal with the rejected
records. One, We could leave the rejected
data out of DWH or, two we could correct it
based on whether the rejected field is critical
to business and is worth reprocessing, and
then load it into DWH, and last option is to
The process of correcting the rejected data
and then loading into DWH is called Data
Reprocessing.

Page 5 of 13

As depicted in the figure above, we reject the data during the data validation process, data cleansing
process and data transformation process. The rejected data is collected in temporary files on the ETL
server while the ETL is running. Once the ETL is complete, the rejected data is moved into the Landing
Area.

The end user and the business analyst are provided interfaces to read the reject data in landing area.
They take this as the input, analyze the cause of rejection and correct the data at the source itself.
Once the data is corrected at the source, it is again extracted (depicted in Brown line in the figure).
The corrected data is not expected to get rejected again unless the correction provided was
insufficient.

Page 6 of 13

In some business critical data warehouses which have very very low tolerance towards inaccurate data,
we would need a sophisticated and a fast mechanism of handling rejected data in the landing area.
Here we consider a database to land the data. The database schema is the same as that of source
files/tables. We add two more columns to the schema, one to flag whether the record got rejected in
ETL, and the other to identify the date when the data was sent by the source system. Having a
database gives us an option of easily create applications to access and update the data in the landing
area.

Please note that adding a database in the landing area adds the infrastructure and maintenance costs.
Adding the database would also increase the number of processes in the extraction process, thereby
affecting the performance of ETL.

1.2. Infrastructure Exception Handling

Infrastructure related exceptions are caused because of issues in the Network connectivity, the
Database operations and the Operating System.

Common Infrastructure exceptions are

Page 7 of 13

Database Errors like db connection error, Referential integrity constraint failure, primary key
constraint failure, incorrect credentials, data type mismatch, Null in Not Null fields.
Network connection failure causing FTP failure.
Operating system issues on ETL server full causing aborts due to memory insufficiency, un-
mounted file systems, 100% CPU utilization, incorrect file/directory permissions.

The diagram below depicts the
exceptions and the process to handle
them.

The process of detecting the
abovementioned exception is generally
caught by the ETL scheduler which
checks whether there is a non zero value
returned by the ETL process.

If an exception occurs, we make a log
entry, send email or alerts to the users
to notify that the ETL process has
aborted and exit to the Operating
System with a Non Zero value.

The notification process alerts the IS
team to take appropriate action so that
the ETL process can be restarted once
the infrastructure issue is resolved.

Page 8 of 13

1.3. Data Correction in DWH

The data in the DWH could be
incorrect or inaccurate due to a
variety of reasons, mainly

1. Incorrect requirement or
missed, leading to incorrect ETL.
2. Incorrect interpretation
of requirements leading to
incorrect ETL.
3. Uncaught coding defects.
4. Incorrect data from
source.

The reason 1, 2, and 3 would
require us to revisit the ETL code
with respect to the incorrect
requirements, missed
requirements and uncaught
defects.

The figure below depicts the
process to be followed to correct
the data already loaded in DWH.

Detection

Most important is the detection
of the inaccurate or incorrect
data in DWH. Incorrect data
loaded in DWH is usually
detected long after the it has
been loaded when some end-user
identifies it in his/her report.

Analysis

Once reported, we analyze the
report and its metadata. This
would require understanding the
report metadata, calculation and
the SQL generated by the report.

Page 9 of 13

If there is a no issue in the report definition, we analyze the data in DWH. Once we have pin pointed
the table, attributes and the data in DWH where the inaccuracy is, we perform the root cause of the
inaccuracy.

The root cause would require us to check the data with respect to the requirements, design and code.
The root cause helps us identify the next course of action.

Missing Requirements - If the root cause is massing requirements, then we go to the users and get the
complete requirements.

Misinterpretation of Requirements - Here too we go to the end user and clarify on the misinterpreted
requirement.

Defect in the code - There is a possibility of missing detecting bugs during the testing phase. If
undetected, the bug could cause inaccuracy in data.

Correction Process

In case of missing requirements,

1. Get the new requirements from the users.
2. Document the new requirements.
3. Design the new ETL.
4. Code the new ETL.
5. Test the new ETL.
6. Make the DWH offline.
7. Perform the History Load for the new Requirements. This could be possible only when we have
added new tables or new attributes in the data model.
8. Check the report for new requirements.
9. If the reports are correct, then implement the new ETL into the regular ETL.
10. Perform the catch-up load for the duration the DWH was offline.
11. Bring the DWH online.

In case of misinterpreted requirements or undetected bugs,

1. Analyze the ETL and identify the changes in it.
2. Update the design.
3. Correct the code.
4. Test the code.
5. Create a patch to update the historical data (data already in DWH) to correct it.
6. Test the patch.
7. Bring the DWH offline.
8. Run the patch.
9. Check the report for correction.
10. If the reports are correct, then implement the corrected ETL.
11. Perform the catch-up load for the duration the DWH was offline.
12. Bring the DWH online.

Page 10 of 13

2. Error Processing – High Level

The error processing in Target is unique and flawless.

2.1. Capturing
All the various source system data is dumped into the landing area as is. All the records
in the landing area are marked as valid in the first instance during the load.

On a given schedule, the records are processed from landing area to the staging area
and all the business validation are executed on these records. Once the staging load is
finished, all the records which have not been loaded into the staging area are marked as
invalid record in landing area.

Information of all the rejected records which have failed will be stored into the error
tables with error code. There is another table having all reference to the error code.

Depending on the table(s), we would have multiple business validations for a each
record. Hence could end up having multiple entries in the error table(s) for a given
source record.

The records which have been marked as invalid would be processed for every staging
load until they are purged or if a corrected record is sent from the source.

2.2. Error threshold
If the no. of rejections reach a given threshold limit, mail is sent to EAM / Business data
quality team informing the abnormal behavior and job is aborted.

Page 11 of 13

Based on the feedback the jobs are rerun/re-triggered manually.

2.3. Purging
Purging is to delete the previous records which are no more required by a given
business process.

Following are the logic applied on various data.

Purging logic is based on the following:-

2.3.1. Landing Area
1. Valid records – Valid records which have been loaded into the Staging area
will retain only previous 7 days of data. Rest will be purged.

2. Invalid records - Invalid records which have been errored out from Staging
area will be retained for 30 days. Rest will be purged.

2.3.2. Staging Area
Truncate and load. An Area where we load and make sure data is good before
we do any changes to warehouse table.

2.3.3. EDW
Depending on Business need, data is maintained in EDW.

2.3.4. Datamart
Depending on Business need, data is maintained in EDW.

2.4. Purge threshold
During purging, the business can set a threshold limit to the number of records being
purged. If while deleting the threshold limit is crossed. The Purge jobs are automatically
aborted and a mail sent to the EAM / Business data quality team for confirmation.

Once the business confirms, the aborted jobs are later triggered manually.

2.5. Appendix

2.5.1. About Target
TBU

Page 12 of 13

2.5.2. Reference
The Exception Handling Overview is an extract from www.dwhinfo.com written
by Krishan.Vinayak@target.com

2.5.3. Other Contributors

Krishan.Vinayak – Delivery Manager

Devanathan.Rajagopalan – Senior Technical Architect

Asis.Mohanty – BI Manager

Joseph.Raj – Technical Architect

Page 13 of 13

BI Error Processing Framework

Recommended

Recommended

More Related Content

Similar to BI Error Processing Framework

Similar to BI Error Processing Framework (20)

More from Asis Mohanty

More from Asis Mohanty (13)

Recently uploaded

Recently uploaded (20)

BI Error Processing Framework