This document provides guidelines for testing the quality of data, ETL processes, and SQL queries during the development of a data warehouse. It outlines steps to verify data extracted from source systems, transformed and loaded into staging tables, cleansed and consolidated in staging, and finally transformed and loaded into the data warehouse operational tables and data marts. The guidelines describe analyzing source data quality, verifying ETL processes, matching consolidated data, and transforming data according to business rules.
Data Warehouse Testing: It’s All about the PlanningTechWell
Today’s data warehouses are complex and contain heterogeneous data from many different sources. Testing these warehouses is complex, requiring exceptional human and technical resources. So how do you achieve the desired testing success? Geoff Horne believes that it is through test planning that includes technical artifacts such as data models, business rules, data mapping documents, and data warehouse loading design logic. Wayne shares planning checklists, a test plan outline, concepts for data profiling, and methods for data verification. He demonstrates how to effectively create a test strategy to discover empty fields, missing records, truncated data, duplicate records, and incorrectly applied business rules—all of which can dramatically impact the usefulness of the data warehouse. Learn common pitfalls, which can cost your business hundreds of thousands of dollars or more, when test planning shortcuts are taken. If you work in an environment that often performs data warehouse testing without proper planning and technical skills, this session is for you.
As part of this session, I will be giving an introduction to Data Engineering and Big Data. It covers up to date trends.
* Introduction to Data Engineering
* Role of Big Data in Data Engineering
* Key Skills related to Data Engineering
* Role of Big Data in Data Engineering
* Overview of Data Engineering Certifications
* Free Content and ITVersity Paid Resources
Don't worry if you miss the video - you can click on the below link to go through the video after the schedule.
https://youtu.be/dj565kgP1Ss
* Upcoming Live Session - Overview of Big Data Certifications (Spark Based) - https://www.meetup.com/itversityin/events/271739702/
Relevant Playlists:
* Apache Spark using Python for Certifications - https://www.youtube.com/playlist?list=PLf0swTFhTI8rMmW7GZv1-z4iu_-TAv3bi
* Free Data Engineering Bootcamp - https://www.youtube.com/playlist?list=PLf0swTFhTI8pBe2Vr2neQV7shh9Rus8rl
* Join our Meetup group - https://www.meetup.com/itversityin/
* Enroll for our labs - https://labs.itversity.com/plans
* Subscribe to our YouTube Channel for Videos - http://youtube.com/itversityin/?sub_confirmation=1
* Access Content via our GitHub - https://github.com/dgadiraju/itversity-books
* Lab and Content Support using Slack
This presenation explains basics of ETL (Extract-Transform-Load) concept in relation to such data solutions as data warehousing, data migration, or data integration. CloverETL is presented closely as an example of enterprise ETL tool. It also covers typical phases of data integration projects.
Data Warehouse Testing: It’s All about the PlanningTechWell
Today’s data warehouses are complex and contain heterogeneous data from many different sources. Testing these warehouses is complex, requiring exceptional human and technical resources. So how do you achieve the desired testing success? Geoff Horne believes that it is through test planning that includes technical artifacts such as data models, business rules, data mapping documents, and data warehouse loading design logic. Wayne shares planning checklists, a test plan outline, concepts for data profiling, and methods for data verification. He demonstrates how to effectively create a test strategy to discover empty fields, missing records, truncated data, duplicate records, and incorrectly applied business rules—all of which can dramatically impact the usefulness of the data warehouse. Learn common pitfalls, which can cost your business hundreds of thousands of dollars or more, when test planning shortcuts are taken. If you work in an environment that often performs data warehouse testing without proper planning and technical skills, this session is for you.
As part of this session, I will be giving an introduction to Data Engineering and Big Data. It covers up to date trends.
* Introduction to Data Engineering
* Role of Big Data in Data Engineering
* Key Skills related to Data Engineering
* Role of Big Data in Data Engineering
* Overview of Data Engineering Certifications
* Free Content and ITVersity Paid Resources
Don't worry if you miss the video - you can click on the below link to go through the video after the schedule.
https://youtu.be/dj565kgP1Ss
* Upcoming Live Session - Overview of Big Data Certifications (Spark Based) - https://www.meetup.com/itversityin/events/271739702/
Relevant Playlists:
* Apache Spark using Python for Certifications - https://www.youtube.com/playlist?list=PLf0swTFhTI8rMmW7GZv1-z4iu_-TAv3bi
* Free Data Engineering Bootcamp - https://www.youtube.com/playlist?list=PLf0swTFhTI8pBe2Vr2neQV7shh9Rus8rl
* Join our Meetup group - https://www.meetup.com/itversityin/
* Enroll for our labs - https://labs.itversity.com/plans
* Subscribe to our YouTube Channel for Videos - http://youtube.com/itversityin/?sub_confirmation=1
* Access Content via our GitHub - https://github.com/dgadiraju/itversity-books
* Lab and Content Support using Slack
This presenation explains basics of ETL (Extract-Transform-Load) concept in relation to such data solutions as data warehousing, data migration, or data integration. CloverETL is presented closely as an example of enterprise ETL tool. It also covers typical phases of data integration projects.
Data protection and privacy regulations such as the EU’s General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and Singapore’s Personal Data Protection Act (PDPA) have been major drivers for data governance initiatives and the emergence of data catalog solutions. Organizations have an ever-increasing appetite to leverage their data for business advantage, either through internal collaboration, data sharing across ecosystems, direct commercialization, or as the basis for AI-driven business decision-making. This requires data governance and especially data asset catalog solutions to step up once again and enable data-driven businesses to leverage their data responsibly, ethically, compliantly, and accountably.
This presentation explores how data catalog has become a key technology enabler in overcoming these challenges.
In this webinar, we’ll show you how Cloudera SDX reduces the complexity in your data management environment and lets you deliver diverse analytics with consistent security, governance, and lifecycle management against a shared data catalog.
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
This Data Warehouse Tutorial For Beginners will give you an introduction to data warehousing and business intelligence. You will be able to understand basic data warehouse concepts with examples. The following topics have been covered in this tutorial:
1. What Is The Need For BI?
2. What Is Data Warehousing?
3. Key Terminologies Related To Data Warehouse Architecture:
a. OLTP Vs OLAP
b. ETL
c. Data Mart
d. Metadata
4. Data Warehouse Architecture
5. Demo: Creating A Data Warehouse
Data-Ed Online: Approaching Data QualityDATAVERSITY
Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy. This, in turn, allows for speedy identification of business problems, the delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues. Organizations must realize what it means to utilize Data Quality engineering in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor Data Quality. Showing how Data Quality should be engineered provides a useful framework in which to develop an effective approach. This, in turn, allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
Learning Objectives:
Help you understand foundational Data Quality concepts based on the DAMA Guide to Data Management Book of Knowledge (DAMA DMBoK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success
Straight Talk to Demystify Data LineageDATAVERSITY
Are you sure you trust the data you just used for that $10 million decision? To trust data authenticity we must first understand its lineage. However, the term "Data Lineage" itself is ambiguous since it is used in different contexts. "Business Lineage" links metadata constructs to specific terms in a business glossary. This approach is used by numerous Data Governance solutions. This approach alone comes up short, since it doesn't trace the real flow of information through an organization. "Technical Lineage" traces data's journey through different systems and data stores, providing an audit trail of the changes along the way. True "Data Lineage" combines both aspects, providing context to fully understand the data life cycle. Every step in data's journey is a potential source for introduction of error that could compromise Data Quality, and hence, business decisions. In this session, Ron Huizenga offers a comprehensive discussion of data lineage and associated Data Quality remediation approaches that are essential to build a foundation for Data Governance.
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of the house.
IF you want to use these slides, please put (C) Dan Linstedt, all rights reserved, http://LearnDataVault.com
The Data Governance Annual Conference and International Data Quality Conference in San Diego was very good. I recommend this conference for business and IT persons responsible for data quality and data governenance. There will be a similar event in Orlando, December 2010. This is the presentation I delivered to a grateful audience.
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
Imagine a fast, more efficient business thriving on trusted data-driven decisions. An intelligent data catalog can help your organization discover, organize, and inventory all data assets across the org and democratize data with the right balance of governance and flexibility. Informatica's data catalog tools are powered by AI and can automate tedious data management tasks and offer immediate recommendations based on derived business intelligence. We offer data catalog workshops globally. Visit Informatica.com to attend one near you.
The Data Driven University - Automating Data Governance and Stewardship in Au...Pieter De Leenheer
Data Governance and Stewardship requires automation of business semantics management at its nucleus, in order to achieve data trust between business and IT communities in the organization. University divisions operate highly autonomously and decentralized, and are often geographically distributed. Hence, they benefit more from an collaborative and agile approach to Data Governance and Stewardship approach that adapts to its nature.
In this lecture, we start by reviewing 'C' in ICT and reflect on the dilemma: what is the most important quality of data being shared: truth or trust? We review the wide spectrum of business semantics. We visit the different phases of growing data pain as an organization expands, and we map each phase on this spectrum of semantics.
Next, we introduce our principles and framework for business semantics management to support Data Governance and Stewardship focusing on the structural (what), processual (how) and organizational (who) components. We illustrate with use cases from Stanford University, George Washington University and Public Science and Innovation Administrations.
What are the characteristics and objectives of ETL testing_.docxTechnogeeks
ETL (Extract, Transform, Load) testing is a vital process in ensuring the accuracy, integrity, and performance of data as it moves through the ETL pipeline. It encompasses various characteristics and objectives aimed at validating data quality, transformation logic, error handling, and compliance with business rules and regulations. ETL testing is essential for maintaining reliable and efficient data processes in business intelligence and data warehousing projects.
Data lineage tracing is pivotal in ETL testing as it facilitates understanding, documenting, and visualizing the flow of data from source to destination. By tracking data transformations and movements, testers can effectively analyze, troubleshoot, and document data flows, ensuring transparency, accountability, and reliability in ETL processes.
When migrating from legacy systems, handling data consistency issues requires meticulous planning, including data profiling, mapping, cleansing, reconciliation, and thorough testing. This ensures a smooth transition and maintains data integrity across systems.
Testing slowly changing dimensions (SCDs) involves different approaches based on the type of SCD implemented, including Type 1, Type 2, Type 3, hybrid approaches, CDC mechanisms, and regression testing. Each approach ensures that dimensional data remains accurate and consistent over time.
By implementing comprehensive ETL testing strategies and leveraging various testing approaches, organizations can enhance data quality, ensure regulatory compliance, and make informed business decisions based on reliable data. ETL testing courses offer valuable opportunities for individuals to gain expertise in data quality assurance, preparing them for success in data-centric roles.
Creating a Data validation and Testing StrategyRTTS
Creating A Data Validation & Testing Strategy
Are you struggling with formulating a strategy for how to validate the massive amount of data continuously entering your data warehouse or data lake?
We can help you!
Learn how RTTS’ Data Validation Assessment provides:
- an evaluation of your current data validation process
- recommendations on how to improve your process and
- a proposal for successful implementation
This slide deck addresses the following issues:
- How do I find out if I have bad data?
- How do I ensure I am testing the proper data permutations?
- How much of my data needs to be validated and automated?
- Which critical data endpoints need to be tested?
- How do I test data in my cloud environments?
And much more!
For more information, visit:
https://www.rttsweb.com/services/solutions/data-validation-assessment
Data protection and privacy regulations such as the EU’s General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and Singapore’s Personal Data Protection Act (PDPA) have been major drivers for data governance initiatives and the emergence of data catalog solutions. Organizations have an ever-increasing appetite to leverage their data for business advantage, either through internal collaboration, data sharing across ecosystems, direct commercialization, or as the basis for AI-driven business decision-making. This requires data governance and especially data asset catalog solutions to step up once again and enable data-driven businesses to leverage their data responsibly, ethically, compliantly, and accountably.
This presentation explores how data catalog has become a key technology enabler in overcoming these challenges.
In this webinar, we’ll show you how Cloudera SDX reduces the complexity in your data management environment and lets you deliver diverse analytics with consistent security, governance, and lifecycle management against a shared data catalog.
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
This Data Warehouse Tutorial For Beginners will give you an introduction to data warehousing and business intelligence. You will be able to understand basic data warehouse concepts with examples. The following topics have been covered in this tutorial:
1. What Is The Need For BI?
2. What Is Data Warehousing?
3. Key Terminologies Related To Data Warehouse Architecture:
a. OLTP Vs OLAP
b. ETL
c. Data Mart
d. Metadata
4. Data Warehouse Architecture
5. Demo: Creating A Data Warehouse
Data-Ed Online: Approaching Data QualityDATAVERSITY
Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy. This, in turn, allows for speedy identification of business problems, the delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues. Organizations must realize what it means to utilize Data Quality engineering in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor Data Quality. Showing how Data Quality should be engineered provides a useful framework in which to develop an effective approach. This, in turn, allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
Learning Objectives:
Help you understand foundational Data Quality concepts based on the DAMA Guide to Data Management Book of Knowledge (DAMA DMBoK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success
Straight Talk to Demystify Data LineageDATAVERSITY
Are you sure you trust the data you just used for that $10 million decision? To trust data authenticity we must first understand its lineage. However, the term "Data Lineage" itself is ambiguous since it is used in different contexts. "Business Lineage" links metadata constructs to specific terms in a business glossary. This approach is used by numerous Data Governance solutions. This approach alone comes up short, since it doesn't trace the real flow of information through an organization. "Technical Lineage" traces data's journey through different systems and data stores, providing an audit trail of the changes along the way. True "Data Lineage" combines both aspects, providing context to fully understand the data life cycle. Every step in data's journey is a potential source for introduction of error that could compromise Data Quality, and hence, business decisions. In this session, Ron Huizenga offers a comprehensive discussion of data lineage and associated Data Quality remediation approaches that are essential to build a foundation for Data Governance.
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of the house.
IF you want to use these slides, please put (C) Dan Linstedt, all rights reserved, http://LearnDataVault.com
The Data Governance Annual Conference and International Data Quality Conference in San Diego was very good. I recommend this conference for business and IT persons responsible for data quality and data governenance. There will be a similar event in Orlando, December 2010. This is the presentation I delivered to a grateful audience.
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
Imagine a fast, more efficient business thriving on trusted data-driven decisions. An intelligent data catalog can help your organization discover, organize, and inventory all data assets across the org and democratize data with the right balance of governance and flexibility. Informatica's data catalog tools are powered by AI and can automate tedious data management tasks and offer immediate recommendations based on derived business intelligence. We offer data catalog workshops globally. Visit Informatica.com to attend one near you.
The Data Driven University - Automating Data Governance and Stewardship in Au...Pieter De Leenheer
Data Governance and Stewardship requires automation of business semantics management at its nucleus, in order to achieve data trust between business and IT communities in the organization. University divisions operate highly autonomously and decentralized, and are often geographically distributed. Hence, they benefit more from an collaborative and agile approach to Data Governance and Stewardship approach that adapts to its nature.
In this lecture, we start by reviewing 'C' in ICT and reflect on the dilemma: what is the most important quality of data being shared: truth or trust? We review the wide spectrum of business semantics. We visit the different phases of growing data pain as an organization expands, and we map each phase on this spectrum of semantics.
Next, we introduce our principles and framework for business semantics management to support Data Governance and Stewardship focusing on the structural (what), processual (how) and organizational (who) components. We illustrate with use cases from Stanford University, George Washington University and Public Science and Innovation Administrations.
What are the characteristics and objectives of ETL testing_.docxTechnogeeks
ETL (Extract, Transform, Load) testing is a vital process in ensuring the accuracy, integrity, and performance of data as it moves through the ETL pipeline. It encompasses various characteristics and objectives aimed at validating data quality, transformation logic, error handling, and compliance with business rules and regulations. ETL testing is essential for maintaining reliable and efficient data processes in business intelligence and data warehousing projects.
Data lineage tracing is pivotal in ETL testing as it facilitates understanding, documenting, and visualizing the flow of data from source to destination. By tracking data transformations and movements, testers can effectively analyze, troubleshoot, and document data flows, ensuring transparency, accountability, and reliability in ETL processes.
When migrating from legacy systems, handling data consistency issues requires meticulous planning, including data profiling, mapping, cleansing, reconciliation, and thorough testing. This ensures a smooth transition and maintains data integrity across systems.
Testing slowly changing dimensions (SCDs) involves different approaches based on the type of SCD implemented, including Type 1, Type 2, Type 3, hybrid approaches, CDC mechanisms, and regression testing. Each approach ensures that dimensional data remains accurate and consistent over time.
By implementing comprehensive ETL testing strategies and leveraging various testing approaches, organizations can enhance data quality, ensure regulatory compliance, and make informed business decisions based on reliable data. ETL testing courses offer valuable opportunities for individuals to gain expertise in data quality assurance, preparing them for success in data-centric roles.
Creating a Data validation and Testing StrategyRTTS
Creating A Data Validation & Testing Strategy
Are you struggling with formulating a strategy for how to validate the massive amount of data continuously entering your data warehouse or data lake?
We can help you!
Learn how RTTS’ Data Validation Assessment provides:
- an evaluation of your current data validation process
- recommendations on how to improve your process and
- a proposal for successful implementation
This slide deck addresses the following issues:
- How do I find out if I have bad data?
- How do I ensure I am testing the proper data permutations?
- How much of my data needs to be validated and automated?
- Which critical data endpoints need to be tested?
- How do I test data in my cloud environments?
And much more!
For more information, visit:
https://www.rttsweb.com/services/solutions/data-validation-assessment
What are the key points to focus on before starting to learn ETL Development....kzayra69
Before embarking on your journey into ETL (Extract, Transform, Load) Development, it's essential to focus on several key points to build a robust foundation. Firstly, grasp the fundamental principles of ETL, encompassing data extraction, transformation, and loading processes. Acquire knowledge about data warehousing concepts as ETL often serves as a pivotal component in data warehousing projects. Furthermore, develop a solid understanding of SQL and databases, including tables, indexes, joins, and SQL syntax. Proficiency in programming languages like Python, Java, or scripting languages is also beneficial, depending on the chosen ETL tool or if building custom solutions. Explore popular ETL tools such as Informatica, Talend, Pentaho, or Apache NiFi to understand their features and capabilities. Additionally, familiarize yourself with techniques for ensuring data quality throughout the ETL process, including data validation, error handling, and data profiling. Understanding common data integration patterns such as batch processing and real-time processing is also crucial. These key points collectively lay the groundwork for effective ETL design, implementation, and maintenance, setting you on the path to success in the dynamic field of ETL Development.
Data Verification In QA Department FinalWayne Yaddow
Data warehouse and ETL testing should be conducted according to a process and checklist. This presentation provides an overview of recommended methods.
ETL stands for extract, transform, and load and is a traditionally accepted way for organizations to combine data from multiple systems into a single database, data store, data warehouse, or data lake.
Every repository has a different set of rules that holds the data together. Each of the
1,000’s of tables and files within each repository has uniquely different data validation
rules. Making it very hard to identify, create and maintain 100,000’s of rules for even
medium sized repositories
What is ETL testing & how to enforce it in Data WharehouseBugRaptors
Bugraptors always remains up to date with latest technologies and ongoing trends in testing. Technology like ELT Testing bringing the great changes which arises the scope of testing by keeping in mind all the positive and negative scenarios.
As technology develops, software programs become more complex and more fragile. In addition to high functionality and seamless user interaction, aesthetics and presentation are increasingly significant in apps. Database testing is now crucial for assessing an application's databases effectively. Here are all the things you need to know about database testing.
Top 20 ETL Testing Interview Questions.pdfAnanthReddy38
ETL Testing Training from Magnitia helps you to learn a step-by-step process that includes ETL Testing introduction, difference between OLAP and OLTP, RDBM, learning data warehousing concepts, its workflow, difference between data warehouse testing and database testing, deploying SQL for checking data and the basis for Business Intelligence.
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
Data Analytics has emerged has one of the central aspects of business operations. Consequently, the quest to grab professional positions within the Data Analytics domain has assumed unimaginable proportions. So if you too happen to be someone who is desirous of making through a Data Analyst .
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Etl And Data Test Guidelines For Large Applications
1. QA Guidelines for Data Warehouse Quality Verification
This document describes testing guidelines and steps for verifying data, ETL processes, and SQL
during the construction, unit testing, system and integration testing of an application’s data
warehouse operational tables and data mart.
An Overview of Data Warehouse Testing
A data warehouse is a repository of transaction data that has been extracted from original sources
and transformed so that query, analysis and reporting on trends within historic data is both
possible and efficient. The analyses provided by data warehouses may support an organization’s
strategic planning, decision support, or monitoring of outcomes of chosen strategies. Typically,
data that is loaded into a data warehouse is derived from diverse sources of operational data,
which may consist of data from databases, feeds, application files (such as office productivity
software files) or flat files. The data must be extracted from these diverse sources, transformed to
a common format, and loaded into the data warehouse.
Extraction, transformation and loading (ETL) is a critical step in any data warehouse
implementation, and continues to be an area of major importance over the life of the warehouse
due to recurrent warehouse updating. Once a data warehouse is populated, front-end systems
must be tested to facilitate querying, analysis and reporting. The data warehouse front-end may
provide a simple presentation of data based on queries, or may support sophisticated statistical
analysis options. Data warehouses may have multiple front-end applications, depending on the
various profiles within the user community.
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 1
2. An effective data warehouse testing strategy focuses on the main structures within the data
warehouse architecture:
1) The ETL layer
2) The full data warehouse
3) The front-end data warehouse applications
Each of these units must be treated separately and in combination, and since there may be
multiple components in each (multiple feeds to ETL, multiple databases or data repositories that
constitute the warehouse, and multiple front-end applications), each of these subsystems must be
individually validated.
1.) Verify and Maintain the Data Low Level Design (LLD)
A first level of testing and validation begins with the formal acceptance of the logical data model
and “low level design” (LLD). All further testing and validation will be based on the
understanding of each of the data elements in the model.
Data elements that are created through a transformation or summary process must be clearly
identified and calculations for each of these data elements must be clear and easily interpreted.
During the LLD reviews and updates, special consideration should be given to typical modeling
scenarios that exist in the project. Examples follow:
1. Verify that many-to-many attribute relationships are clarified and resolved.
2. Verify the types of keys that are used: surrogate keys versus natural keys.
3. Verify that the business analyst / DBA reviewed with ETL architect and developers
(application) the lineage and business rules for extracting, transforming, and loading the
data warehouse?
4. Verify that all transformation rules, summarization rules, and matching and consolidation
rules have clear specifications.
5. Verify that specified transformations, business rules and cleansing specified in LLD and
other application logic specs have been coded correctly in ETL, JAVA, and SQL used for
data loads.
6. Verify that procedures are documented to monitor and control data extraction,
transformation and loading. The procedures should describe how to handle exceptions
and program failures.
7. Verify that data consolidation of duplicate or merged data was properly handled.
8. Verify that samplings of domain transformations will be taken to verify they are properly
changed.
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 2
3. 9. Compare unique values of key fields between source data and data loaded to the
warehouse. This is a useful technique that points out a variety of possible data errors
without doing a full validation on all fields.
10. Validate that target data types are as specified in the design and/or the data model.
11. Verify how sub-class/super-class attributes depicted.
12. Verify that data field types and formats are specified.
13. Verify that defaults are specified for fields where needed.
14. Verify that processing for invalid field values in source are defined
15. Verify that expected ranges of field contents are specified where known.
16. Verify that keys generated by the “sequence generator” are identified.
17. Verify that slowly changing dimensions are described?
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 3
4. 2.) Analyze Source Data Before & After Extraction to Staging
Testers should extract representative data from each source file (before or after extract to staging
tables) and confirm that the data is consistent with its definition; QA can discover any anomalies
in how the data is represented and write defect reports where necessary. The objective is to
discover data that does not meet “data quality factors” as described in specifications. See list
below and Table 1.
This verification process will be used for temp tables used in a step process for data
transformations, cleaning, etc.
• Verify that the scope of values in each column are within specifications
• Identify unexpected values in each field
• Verify relationships between fields
• Identify frequencies of values in columns and whether these frequencies make sense?
Inputs: Application source data models and low level data design, data dictionaries, data
attribute sources.
Outputs: Newly discovered attributes, undefined business rules, data anomalies such as fields
used for multiple purposes.
Techniques and Tools: Data extraction software, business rule discovery software, data analysis
tools.
Process Description:
1. Extract representative samples of data from each source or staging table.
2. Parse the data for the purpose of profiling.
3. Verify that not-null fields are populated as expected.
4. Structure discovery – Does the data match the corresponding metadata? Do field
attributes of the data match expected patterns? Does the data adhere to appropriate
uniqueness and null value rules?
5. Data discovery – Are the data values complete, accurate and unambiguous?
6. Relationship discovery – Does the data adhere to specified required key relationships
across columns and tables? Are there inferred relationships across columns, tables or
databases? Is there redundant data?
7. Verify that all required data from the source was extracted. Verify that extraction
process did not extract more or less data source than it should have.
8. Verify or write defects for exceptions and errors discovered during the ETL process.
9. Verify that extraction process did not extract duplicate data from the source (usually
this happens in repeatable processes where at point zero we need to extract all data
from the source file, but the during the next intervals we only need to capture the
modified, and new rows.).
10. Validate that no data truncation occurred during staging.
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 4
5. 11. Utilize a data profiling tool or methods that show the range and value distributions of
fields in the source data. This is used to identify any data anomalies from source
systems that may be missed even when the data movement is correct.
12. Validation & Certification Method: it is sufficient to identify the requirements and
count (via SQL) the number of rows that should be extracted from the source systems.
The QA team will also count the number of rows in the result / target sets and match
the two for validation. The QA team will maintain a set of SQL statements that are
automatically run at this stage to validate that no duplicate data have been extracted
from the source systems.
Table 1: Data Quality Factors
FACTOR DESCRIPTION EXAMPLE
Data Consistency Issues:
Varying Data The data type and length for a particular Account number may be defined as: Number
Definitions attribute may vary in files or tables though the (9) in one field or table and Varchar2(11) in
semantic definition is the same. another table
Misuse of When referential integrity constraints are An account record is missing but dependent
Integrity misused, foreign key values may be left records are not deleted.
Constraints “dangling” or inadvertently deleted.
Nulls Nulls when field defined as “not-null”. The company has been entered as a null value
for a business. A report of all companies would
not list the business.
Data Completeness Issues:
Missing data Data elements are missing due to a lack of An account date of estimated arrival is null thus
integrity constraints or nulls that are impacting an assessment of variances in
inadvertently not updated. estimated/actual account data.
Inaccessible Data Inaccessible records due to missing or Business numbers are used to identify a
redundant identifier values. customer record. Because uniqueness was not
enforced, the business ID (45656) identifies
more than one customer.
Missing Integrity Missing constraints can cause data errors due to Account records with a business identifier exist
Constraints nulls, non-uniqueness, or missing relationships. in the database but cannot be matched to an
existing business.
Data Correctness Issues:
Loss Projection Tables that are joined over non key attributes Lisa Evans works in the LA office in the
will produce non existent data that is shown to Accounting department. When a report is
the user. generated, it shows her working in IT
department.
Incorrect Data Data that is misspelled or inaccurately recorded. 123 Maple Street is recorded with a spelling
Values mistake and a street abbreviation (123 Maple
St)
Inappropriate Data is updated incorrectly through views. A view contains non key attributes from base
Use of Views tables. When the view is used to update the
database, null values are entered into the key
columns of the base tables.
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 5
6. FACTOR DESCRIPTION EXAMPLE
Disabled Null, non unique, or out of range data may be The primary key constraint is disabled during
Integrity stored when the integrity constraints are an import function. Data is entered into the
Constraints disabled. existing data with null unique identifiers.
Non-duplication Testing should be conducted to determine if Duplicate rows or column data.
there’s duplication of data where there should
not be.
Misuse of Check whether null or foreign key constraints Check constraint only allows hard coded values
Integrity are inappropriate or too restrictive. of “C”, “A”, “X”, and “Z”. But a new code
Constraints “B” cannot be entered.
Data Comprehension Issues:
Data Aggregated data is used to represent a set of One name field is used to store surname, first
Aggregation data elements. name, middle initial, and last name (e.g., John,
Hanson, Mr.).
Cryptic Object Database object (e.g., column) has a cryptic, Customer table with a column labeled, “c_avd”.
Definitions unidentifiable name. There is no documentation as to what the
column might contain.
Unknown or Cryptic data stored as codes, abbreviations, Shipping codes used to represent various parts
Cryptic Data truncated, or with no apparent meaning. of the customer base (‘01’, ‘02’, ‘03’). No
supporting document to explain the meaning of
the codes.
Accuracy Data will be matched against business rules. Boundary values (low, high’s) will be identified
for relevant fields and compared with
expectations.
Completeness Data will be assessed to verify that all required
is present. Missing rows will be identified; Null
values will be identified in data elements where
a value is expected.
Precision Precision testing is conducted to evaluate the
level of data not sufficiently precise based on
specifications.
3.) Verify Corrected, Cleaned, Source Data in Staging
This step works to improve the quality of existing data in source files or “defects” that meet
source specs but must be corrected before load.
Inputs:
Files or tables (staging) that require cleansing; data definition and business rule
documents, data map of source files and fields; business rules, data anomalies discovered
in earlier steps of this process.
Fixes for data defects that will result in data that does not meet specifications for the
application DW.
Outputs: Defect reports, cleansed data, rejected or uncorrectable data
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 6
7. Techniques and Tools: Data reengineering, transformation, and cleansing tools, MS Access,
Excel filtering.
Process Description: In this step, data with missing values, known errors, and suspect data is
corrected. Automated tools may be identified to best to locate, clean / correct large volumes of
data.
1. Document the type of data cleansing approach taken for each data type in the repository.
2. Determine how “uncorrectable” or suspect data is processed, rejected, maintained for
corrective action. SME’s and stakeholders should be involved in the decision.
3. Review ETL defect reports to assess rejected data excluded from source files or
information group targeted for the warehouse.
4. Determine if data not meeting quality rules was accepted.
5. Document in defect reports, records and important fields that cannot be easily corrected.
6. Document records that were corrected and how corrected.
Certification Method: Validation of data cleansing processes could be a tricky proposition, but
certainly doable. All data cleansing requirements should be clearly identified. The QA team
should learn all data cleansing tools available and their methods. QA should create various
conditions as specified in the requirements for the data cleansing tool to support and validate its
results. QA will run a volume of real data through each tool to validate accuracy as well as
performance.
4.) Verifying Matched and Consolidated Data
There are often ETL processes where data has been consolidated from various files into a single
occurrence of records. The cleaned and consolidated data can be assessed to very matched and
consolidated data.
Much of the ETL heavy lifting occurs in the transform step where combined data, data with
quality issues, updated data, surrogate keys, build aggregates, are processed.
Inputs: Analysis of all files or databases for each entity type
Outputs:
Report of matched, consolidated, related data that is suspect or in error
List of duplicate data records or fields
List of duplicate data suspects.
Techniques and Tools: Data matching techniques or tools; data cleansing software with
matching and merging capabilities.
Process Description:
1. Establish match criteria for data. Select attributes to become the basis for possible
duplicate occurrences (e.g., names, account numbers).
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 7
8. 2. Determine the impact of incorrectly consolidated records. If the negative impact of
consolidating two different occurrences such as different customers into a single
customer record exists, submit defect reports. The fix should be higher controls to help
avoid such consolidations in the future.
3. Determine the matching techniques to be used: Exact character match in two
corresponding fields such as wild card match, key words, close match, etc.
4. Compare match criteria for specific record with all other records within a given file to
look for intra-file duplicate records.
5. Compare match criteria for a specific record with all records in another file to seek
inter-file duplicate records.
6. Evaluate potential matched occurrences to assure they are, in fact, duplicate.
7. Verify that consolidated data into single occurrences is correct.
8. Examine and re-relate data related to old records being consolidated to new
occurrence-of-reference record. Validate that no related data was overlooked.
5.) Verify Transformed / Enhanced / Calculated Data to Target
Tables
At this stage, base data is being prepared for loading into the Application operational tables and
the data mart. This includes converting and formatting cleansed, consolidated data into the new
data architecture and possibly enhancing internal operational data with external data licensed
from service providers.
The objective is to successfully map the cleaned, corrected and consolidated data into the DW
environment.
Inputs: Cleansed, consolidated data; external data from service providers; business rules
governing the source data; business rules governing the target DW data; transformation rules
governing the transformation process; DW or target data architecture; data map of source data to
standardized data.
Output: Transformed, calculated, enhanced data; updated data map of source data to
standardized data; data map of source data to target data architecture
Techniques and Tools: Data transformation software; external or online or public databases.
Process Description:
1. Verify that the data warehouse construction team is using the data map of source data
to the DW standardized data, verify the mapping.
2. Verify that the data transformation rules and routines are correct.
3. Verify the data transformations to the DW and assure that the processes were
performed according to specifications.
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 8
9. 4. Verify that data loaded in the operational tables and data mart meets the definition of
the data architecture including data types, formats, accuracy, etc.
5. Develop scenarios to be covered in Load Integration Testing
6. Count Validation: Record Count Verification DWH backend/Reporting queries
against source and target as an initial check.
7. Dimensional Analysis: Data integrity exists between the various source tables and
parent / child relationships.
8. Statistical Analysis: Validation for various calculations.
9. Data Quality Validation: - Check for missing data, negatives and consistency. Field-
by-field data verification will be done to check the consistency of source and target
data.
10. Granularity: Validate at the lowest granular level possible (lowest in the hierarchy
E.g. Country-City-Sector– start with test cases).
11. Dynamic Transformation Rules & Tables: such methods need to be checked
continuously to ensure the correct transformation routines are executed. Verify that
dynamic mapping tables and dynamic mapping rules provide an easy, documented,
and automated way for transforming values from one or more sources into a standard
value presented in the DW.
12. Verification Method: The QA team will identify the detailed requirements as they
relate to transformation and validate the dynamic transformation rules and tables
against DW records. Utilizing SQL and related tools, the team will identify unique
values in source data files that are subject to transformation. The QA team identifies
the results from the transformation process and validate that such transformation have
accurately taken place.
6.) Front-end UI and Report Testing Using Operational Tables and
Data Mart
End user reporting is a major component of the Application Project. The report code may run
aggregate SQL queries against the data stored in the data mart and/or the operational tables then
display results in a suitable format either in a Web browser or on a client application interface.
Once the initial view is rendered, the reporting tool interface provides various ways of
manipulating the information such as sorting, pivoting, computing subtotals, and adding view
filters to slice-and-dice the information further. Special considerations such as those below will
be prepared while testing the reports:
1. The ETL process should be complete, the data mart must be populated and data
quality testing should be largely completed.
2. The front-end will use a SQL engine which will generate the SQL based on the how
the dimension and fact tables are mapped. Additionally, there may be global or
report-specific parameters set to handle very large database (VLDB)-related
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 9
10. optimization requirements. As such, testing of the front-end will concentrate on
validating the SQL generated; this in turn validates the dimensional model and the
report specification vis-à-vis the design.
3. Unit testing of the reports will be conducted to verify the layout format per the design
mockup, style sheets, prompts and filters, attributes and metrics on the report.
4. Unit testing will be executed both in the desktop and Web environment.
5. System testing of the reports will concentrate on various report manipulation
techniques like the drilling, sorting and export functions of the reports in the Web
environment.
6. Reports and/or documents need special consideration for testing because they are
high visibility reports used by the top analysts and because they have various charts,
gauges and data points to provide a visual insight to the performance of the
organization in question.
7. There may be some trending reports, or more specifically called comp reports, that
compare the performance of an organizational unit over multiple time periods.
Testing these reports needs special consideration especially if a fiscal calendar is used
instead of an English calendar for time period comparison.
8. For reports containing derived metrics special focus should be paid to any subtotals.
The subtotal row should use a "smart-total," i.e., do the aggregation first and then do
the division instead of adding up the individual cost per click of each row in the
report.
9. Reports with "non-aggregate-able" metrics (e.g., inventory at hand) also need special
attention to the subtotal row. It should not, for example, add up the inventory for each
week and show the inventory of the month.
10. During unit testing, all data formats will be verified against the standard. For
example, metrics with monetary value should show the proper currency symbol,
decimal point precision (at least two places) and the appropriate positive or negative.
For example, negative numbers should be shown in red and enclosed in braces.
11. During system testing, while testing the drill-down capability of reports, care will be
taken to verify that the subtotal at the drill-down report matches with the
corresponding row of the summary report. At times, it is desirable to carry the parent
attribute to the drill-down report; verify the requirements for this.
12. When testing reports containing conditional metrics, care will be taken to check for
"outer join condition;" i.e., nonexistence of one condition is reflected appropriately
with the existence of the other condition.
13. Reports with multilevel sorting will get special attention for testing especially if the
multilevel sorting includes both attributes and metrics to be sorted.
14. Reports containing metrics at different dimensionality and with percent-to-total
metrics and/or cumulative metrics needs will get special attention to check that the
subtotals are hierarchy-aware (i.e., they "break" or "re-initialized" at the appropriate
levels).
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 10
11. 7.) Operational Table and Data Mart: Build Sanity Test
1. Session Completions: All workflow sessions completed successfully using the Log
Viewer.
2. Source to Target Counts: This process verifies that the number of records in the source
system matches the number of records received, and ultimately processed, into the data
warehouse. If Look-up’s are involved in the ETL process, the count between source and
target will not match. The ETL Session log and target table counts are compared.
3. Source to Target Data Verification: The process verifies that all source and reference tables
have data before running ETLs. We verify that all target tables were truncated before the
load unless target tables are updated. This process verifies that the source field threshold
is not subject to truncation during the transformation or loading of data.
4. Field to Field Verification: This process verifies the field values from the source system to
target. This process ensures that the data mapping from the source system to the target is
correct, and that data sent has been loaded accurately.
5. ETL Exception Processing: Exception processing verification looks for serious data errors
that would cause system processing failures or data corruption. An Exception report
verifying the number and types of errors encountered is produced and reviewed for
additional processing and / or reporting to the customer.
There are two primary types of Exception process:
1. Database Exception:
• Not Null - Source column is null while target is not null
• Reference Key - The records coming from the source data do not have a
corresponding parent key in the parent table.
• Unique Key - The record already exists in the target table.
• Check Constraint - CHECK constraints enforce domain integrity by limiting the
values that are accepted by a column
2. Business Exception
These are the exceptions thrown based on certain business rules defined for specific data
elements or group of data elements
• ETL process utilizes a single Exception Table to capture the exceptions from various
ETL sessions and an Error Lookup table which has various error codes and their
description.
• We check the Exception process using the Session Log and Exception Table.
8.) Sanity Test: Exit and Suspension Criteria
1. No critical defects unfixed; No more than 3 high severity defects.
2. 80% or more of build functionality can be tested – functionality might fail because of
JAVA / report code.
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 11
12. 3. Platform performance is such that test team can productively work to schedule
4. Fewer than 15% of build fixes failed
Wayne Yaddow, QA and Data Quality Analyst, 12/2009 12