Target Corporation   BI Framework   Error Processing   Mohan.Kumar2
Table of Contents1.     Exception Handling Overview (ref 2.5.2) .............................................................
1. Exception Handling Overview (ref 2.5.2)Exception Handling deals with any abnormal termination, unacceptable event or in...
Data related exceptions are caused because of incorrect data format, incorrect value, incompletedata from the source syste...
Infrastructure related exceptions are caused because of issues in the Network , the Database and theOperating System. Comm...
As depicted in the figure above, we reject the data during the data validation process, data cleansingprocess and data tra...
In some business critical data warehouses which have very very low tolerance towards inaccurate data,we would need a sophi...
Database Errors like db connection error, Referential integrity constraint failure, primary keyconstraint failure, incorre...
1.3.   Data Correction in DWH                                The data in the DWH could be                                i...
If there is a no issue in the report definition, we analyze the data in DWH. Once we have pin pointedthe table, attributes...
2. Error Processing – High LevelThe error processing in Target is unique and flawless.   2.1. Capturing       All the vari...
Based on the feedback the jobs are rerun/re-triggered manually.2.3. Purging   Purging is to delete the previous records wh...
2.5.2. Reference     The Exception Handling Overview is an extract from www.dwhinfo.com written     by Krishan.Vinayak@tar...
Upcoming SlideShare
Loading in …5
×

BI Error Processing Framework

3,175 views

Published on

BI Error Processing Framework

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,175
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
161
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

BI Error Processing Framework

  1. 1. Target Corporation BI Framework Error Processing Mohan.Kumar2
  2. 2. Table of Contents1. Exception Handling Overview (ref 2.5.2) ....................................................................................... 3 1.1. Data Reprocessing......................................................................................................................... 5 1.2. Infrastructure Exception Handling ................................................................................................ 7 1.3. Data Correction in DWH................................................................................................................ 92. Error Processing – High Level .............................................................................................................. 11 2.1. Capturing..................................................................................................................................... 11 2.2. Error threshold ............................................................................................................................ 11 2.3. Purging ........................................................................................................................................ 12 2.3.1. Landing Area ....................................................................................................................... 12 2.3.2. Staging Area ........................................................................................................................ 12 2.3.3. EDW..................................................................................................................................... 12 2.3.4. Datamart ............................................................................................................................. 12 2.4. Purge threshold........................................................................................................................... 12 2.5. Appendix ..................................................................................................................................... 12 2.5.1. About Target ....................................................................................................................... 12 2.5.2. Reference ............................................................................................................................ 13 2.5.3. Other Contributors.............................................................................................................. 13 Page 2 of 13
  3. 3. 1. Exception Handling Overview (ref 2.5.2)Exception Handling deals with any abnormal termination, unacceptable event or incorrect data thatcan impact the data flow or accuracy of data in the warehouse/mart.Exceptions in ETL could be classified as Data Related Exceptions and Infrastructure RelatedExceptions.Please Note: In Infrastructure Related exception, Infrastructure glitches are not classified as exceptionas they are temporary and are resolved by the time the job(s) is/are rerun. But, the logs are trackedand maintained.The process of recovering or gracefully exiting when an exception occurs is called exception handling. Page 3 of 13
  4. 4. Data related exceptions are caused because of incorrect data format, incorrect value, incompletedata from the source system. This leads to Data validation exceptions and Data Rejects. The process ofhandling the Data Rejects is called Data Reprocessing. Page 4 of 13
  5. 5. Infrastructure related exceptions are caused because of issues in the Network , the Database and theOperating System. Common Infrastructure exceptions are FTP failure, Database connectivity failure,File system full etc.The data related exceptions are usually documented in the requirements, if not they must be becauseif the data related exceptions are not handled they lead to inaccurate data in the warehouse/mart. Wealso keep a threshold of maximum number of validation or reject failures allowed per load. Any valueabove the threshold would mean the data would be too inaccurate because to too many rejections.There is one more exception which is the presence of inaccurate or incorrect data in the warehouse.This could happen due to 1. Incorrect requirement or missed, leading to incorrect ETL. 2. Incorrect interpretation of requirements leading to incorrect ETL. 3. Uncaught coding defects. 4. Incorrect data from source.The process of Correction of the data already loaded in the warehouse involves fixing the data alreadyloaded and also preventing the inaccuracy to persist in the future. 1.1. Data ReprocessingReprocessing is is an exception handling process which involves the correction of the data that is could not be loaded into the warehouse/mart. There could be many reasons why source data gets rejected from DWH. Most common of them are Data Rejection - Source data not matching critical business codes/attributes. This is called Lookup Failure in ETL. Data Cleansing - Source data containing junk values for business critical fields hence getting rejected during data validation. There are 3 options to deal with the rejected records. One, We could leave the rejected data out of DWH or, two we could correct it based on whether the rejected field is critical to business and is worth reprocessing, and then load it into DWH, and last option is to The process of correcting the rejected data and then loading into DWH is called Data Reprocessing. Page 5 of 13
  6. 6. As depicted in the figure above, we reject the data during the data validation process, data cleansingprocess and data transformation process. The rejected data is collected in temporary files on the ETLserver while the ETL is running. Once the ETL is complete, the rejected data is moved into the LandingArea.The end user and the business analyst are provided interfaces to read the reject data in landing area.They take this as the input, analyze the cause of rejection and correct the data at the source itself.Once the data is corrected at the source, it is again extracted (depicted in Brown line in the figure).The corrected data is not expected to get rejected again unless the correction provided wasinsufficient. Page 6 of 13
  7. 7. In some business critical data warehouses which have very very low tolerance towards inaccurate data,we would need a sophisticated and a fast mechanism of handling rejected data in the landing area.Here we consider a database to land the data. The database schema is the same as that of sourcefiles/tables. We add two more columns to the schema, one to flag whether the record got rejected inETL, and the other to identify the date when the data was sent by the source system. Having adatabase gives us an option of easily create applications to access and update the data in the landingarea.Please note that adding a database in the landing area adds the infrastructure and maintenance costs.Adding the database would also increase the number of processes in the extraction process, therebyaffecting the performance of ETL. 1.2. Infrastructure Exception HandlingInfrastructure related exceptions are caused because of issues in the Network connectivity, theDatabase operations and the Operating System.Common Infrastructure exceptions are Page 7 of 13
  8. 8. Database Errors like db connection error, Referential integrity constraint failure, primary keyconstraint failure, incorrect credentials, data type mismatch, Null in Not Null fields.Network connection failure causing FTP failure.Operating system issues on ETL server full causing aborts due to memory insufficiency, un-mounted file systems, 100% CPU utilization, incorrect file/directory permissions. The diagram below depicts the exceptions and the process to handle them. The process of detecting the abovementioned exception is generally caught by the ETL scheduler which checks whether there is a non zero value returned by the ETL process. If an exception occurs, we make a log entry, send email or alerts to the users to notify that the ETL process has aborted and exit to the Operating System with a Non Zero value. The notification process alerts the IS team to take appropriate action so that the ETL process can be restarted once the infrastructure issue is resolved. Page 8 of 13
  9. 9. 1.3. Data Correction in DWH The data in the DWH could be incorrect or inaccurate due to a variety of reasons, mainly 1. Incorrect requirement or missed, leading to incorrect ETL. 2. Incorrect interpretation of requirements leading to incorrect ETL. 3. Uncaught coding defects. 4. Incorrect data from source. The reason 1, 2, and 3 would require us to revisit the ETL code with respect to the incorrect requirements, missed requirements and uncaught defects. The figure below depicts the process to be followed to correct the data already loaded in DWH. Detection Most important is the detection of the inaccurate or incorrect data in DWH. Incorrect data loaded in DWH is usually detected long after the it has been loaded when some end-user identifies it in his/her report. Analysis Once reported, we analyze the report and its metadata. This would require understanding the report metadata, calculation and the SQL generated by the report. Page 9 of 13
  10. 10. If there is a no issue in the report definition, we analyze the data in DWH. Once we have pin pointedthe table, attributes and the data in DWH where the inaccuracy is, we perform the root cause of theinaccuracy.The root cause would require us to check the data with respect to the requirements, design and code.The root cause helps us identify the next course of action.Missing Requirements - If the root cause is massing requirements, then we go to the users and get thecomplete requirements.Misinterpretation of Requirements - Here too we go to the end user and clarify on the misinterpretedrequirement.Defect in the code - There is a possibility of missing detecting bugs during the testing phase. Ifundetected, the bug could cause inaccuracy in data.Correction ProcessIn case of missing requirements, 1. Get the new requirements from the users. 2. Document the new requirements. 3. Design the new ETL. 4. Code the new ETL. 5. Test the new ETL. 6. Make the DWH offline. 7. Perform the History Load for the new Requirements. This could be possible only when we have added new tables or new attributes in the data model. 8. Check the report for new requirements. 9. If the reports are correct, then implement the new ETL into the regular ETL. 10. Perform the catch-up load for the duration the DWH was offline. 11. Bring the DWH online.In case of misinterpreted requirements or undetected bugs, 1. Analyze the ETL and identify the changes in it. 2. Update the design. 3. Correct the code. 4. Test the code. 5. Create a patch to update the historical data (data already in DWH) to correct it. 6. Test the patch. 7. Bring the DWH offline. 8. Run the patch. 9. Check the report for correction. 10. If the reports are correct, then implement the corrected ETL. 11. Perform the catch-up load for the duration the DWH was offline. 12. Bring the DWH online. Page 10 of 13
  11. 11. 2. Error Processing – High LevelThe error processing in Target is unique and flawless. 2.1. Capturing All the various source system data is dumped into the landing area as is. All the records in the landing area are marked as valid in the first instance during the load. On a given schedule, the records are processed from landing area to the staging area and all the business validation are executed on these records. Once the staging load is finished, all the records which have not been loaded into the staging area are marked as invalid record in landing area. Information of all the rejected records which have failed will be stored into the error tables with error code. There is another table having all reference to the error code. Depending on the table(s), we would have multiple business validations for a each record. Hence could end up having multiple entries in the error table(s) for a given source record. The records which have been marked as invalid would be processed for every staging load until they are purged or if a corrected record is sent from the source. 2.2. Error threshold If the no. of rejections reach a given threshold limit, mail is sent to EAM / Business data quality team informing the abnormal behavior and job is aborted. Page 11 of 13
  12. 12. Based on the feedback the jobs are rerun/re-triggered manually.2.3. Purging Purging is to delete the previous records which are no more required by a given business process. Following are the logic applied on various data. Purging logic is based on the following:- 2.3.1. Landing Area 1. Valid records – Valid records which have been loaded into the Staging area will retain only previous 7 days of data. Rest will be purged. 2. Invalid records - Invalid records which have been errored out from Staging area will be retained for 30 days. Rest will be purged. 2.3.2. Staging Area Truncate and load. An Area where we load and make sure data is good before we do any changes to warehouse table. 2.3.3. EDW Depending on Business need, data is maintained in EDW. 2.3.4. Datamart Depending on Business need, data is maintained in EDW.2.4. Purge threshold During purging, the business can set a threshold limit to the number of records being purged. If while deleting the threshold limit is crossed. The Purge jobs are automatically aborted and a mail sent to the EAM / Business data quality team for confirmation. Once the business confirms, the aborted jobs are later triggered manually.2.5. Appendix 2.5.1. About Target TBU Page 12 of 13
  13. 13. 2.5.2. Reference The Exception Handling Overview is an extract from www.dwhinfo.com written by Krishan.Vinayak@target.com2.5.3. Other Contributors Krishan.Vinayak – Delivery Manager Devanathan.Rajagopalan – Senior Technical Architect Asis.Mohanty – BI Manager Joseph.Raj – Technical Architect Page 13 of 13

×