• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
ETL QA
 

ETL QA

on

  • 1,292 views

 

Statistics

Views

Total Views
1,292
Views on SlideShare
1,292
Embed Views
0

Actions

Likes
0
Downloads
96
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ETL QA ETL QA Document Transcript

    • 1. What are Critical Success Factors? Key areas of activity in which favorable results are necessary for a company to obtain its goal. There are four basic types of CSFs which are: Industry CSFs Strategy CSFs Environmental CSFs Temporal CSFs 2. What is data cube technology used for? Data cubes are commonly used for easy interpretation of data. It is used to represent data along with dimensions as some measures of business needs. Each dimension of the cube represents some attribute of the database. E.g profit per day, month or year. 3. What is data cleaning? Data cleaning is also known as data scrubbing. Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data which need to be merged. 4. Explain how to mine an OLAP cube. An extension of data mining can be used for slicing the data the source cube in discovered data mining. The case table is dimensioned at the time of mining a cube. 5. What are different stages of “Data mining”? A stage of data mining is a logical process for searching large amount information for finding important data. Stage 1: Exploration: One will want to explore and prepare data. The goal of the exploration stage is to find important variables and determine their nature. Stage 2: pattern identification: Searching for patterns and choosing the one which allows making best prediction, is the primary action in this stage. Stage 3: Deployment stage. Until consistent pattern is found in stage 2, which is highly predictive, this stage cannot be reached. The pattern found in stage 2, can be applied for the purpose to see whether the desired outcome is achieved or not.
    • 6. What are the different problems that “Data mining” can solve? Data mining can be used in a variety of fields/industries like marketing of products and services, AI, government intelligence. The US FBI uses data mining for screening security and intelligence for identifying illegal and incriminating e-information distributed over internet. 7. What is Data purging? Deleting data from data warehouse is known as data purging. Usually junk data like rows with null values or spaces are cleaned up. Data purging is the process of cleaning this kind of junk values. 8. What is BUS schema? A BUS schema is to identify the common dimensions across business processes, like identifying conforming dimensions. It has conformed dimension and standardized definition of facts. 9. Define non-additive facts? Non additive facts are facts that cannot be summed up for any dimensions present in fact table. These columns cannot be added for producing any results. 10. What is conformed fact? What is conformed dimensions used for? Conformed fact in a warehouse allows itself to have same name in separate tables. They can be compared and combined mathematically. Conformed dimensions can be used across multiple data marts. They have a static structure. Any dimension table that is used by multiple fact tables can be conformed dimensions. 11. What is real time data-warehousing? In real time data-warehousing, the warehouse is updated every time the system performs a transaction. It reflects the real time business data. This means that when the query is fired in the warehouse, the state of the business at that time will be returned. Explain the use lookup tables and Aggregate tables? An aggregate table contains summarized view of data. Lookup tables, using the primary key of the target, allow updating of records based on the lookup condition.
    • Define slowly changing dimensions (SCD)? SCD are dimensions whose data changes very slowly. eg: city or an employee. This dimension will change very slowly. The row of this data in the dimension can be either replaced completely without any track of old record OR a new row can be inserted, OR the change can be tracked What is cube grouping? A transformer built set of similar cubes is known as cube grouping. They are generally used in creating smaller cubes that are based on the data in the level of dimension. What is Data Warehousing? A data warehouse can be considered as a storage area where relevant data is stored irrespective of the source. Data warehousing merges data from multiple sources into an easy and complete form. What is Virtual Data Warehousing? A virtual data warehouse provides a collective view of the completed data. I t can be considered as a logical data model of the containing metadata What is active data warehousing? An active data warehouse represents a single state of the business. It considers the analytic perspectives of customers and suppliers. It helps to deliver the updated data through reports What is data modeling and data mining? Data Modeling is a technique used to define and analyze the requirements of data that supports organization’s business process. In simple terms, it is used for the analysis of data objects in order to identify the relationships among these data objects in any business. Data Mining is a technique used to analyze datasets to derive useful insights/information. It is mainly used in retail, consumer goods, telecommunication and financial organizations that have a strong consumer orientation in order to determine the impact on sales, customer satisfaction and profitability. What is the difference between data warehousing and business intelligence? Data warehousing relates to all aspects of data management starting from the development, implementation and operation of the data sets. It is a back up of all data relevant to business.( data store).
    • Business Intelligence is used to analyze the data from the point of business to measure any organization’s success. The factors like sales, profitability, marketing campaign effectiveness, market shares and operational efficiency etc are analyzed using Business Intelligence tools like Cognos, Informatica etc. What is snapshot in a data warehouse? Snapshot refers to a complete visualization of data at the time of extraction. It occupies less space and can be used to back up and restore data quickly. What is ETL process in data warehousing? ETL stands for Extraction, transformation and loading. Extracting data from different sources such as flat files, databases or XML data, transforming this data depending on the application’s needs and load this data into a data warehouse. Explain the difference between data mining and data warehousing? Data mining is a method for comparing large amounts of data for the purpose of finding patterns. It is normally used for models and forecasting. Data warehousing is the central repository for the data of several business systems in an enterprise. Data from various resources extracted and organized in the data warehouse selectively for analysis and accessibility. What is an OLTP system and OLAP system? OLTP = OnLine Transaction Processing. Applications that supports and manages transactions which involve high volumes of data are supported by OLTP system. OLTP is based on client-server architecture and supports transactions across networks. OLAP = OnLine Analytical Processing. Business data analysis and complex calculations on low volumes of data are performed by OLAP. An insight of data coming from various resources can be gained by a user with the support of OLAP.
    • What are cubes? Multi dimensional data is logically represented by Cubes in data warehousing. OLAP environments view the data in the form of hierarchical cube. A data cube stores data in a summarized version which helps in a faster analysis of data. The data is stored in such a way that it allows reporting easily. What is analysis service? Analysis service provides a combined view of the data used in OLAP or Data mining Explain sequence clustering algorithm? Sequence clustering algorithm collects similar or related paths, sequences of data containing events. Explain time series algorithm in data mining? Time series algorithm can be used to predict continuous values of data. Once the algorithm is skilled to predict a series of data, it can predict the outcome of other series. E.g. forecast the profit What is XMLA? XMLA stands for XML for Analysis. It is an industry standard for accessing data in analytical systems, such as OLAP. What is surrogate key? Explain it with an example. A surrogate key is a unique identifier in database either for an entity in the modeled word or an object in the database. Surrogate key is an internally generated key by the current system and is invisible to the user. As several objects are available in the database corresponding to surrogate, surrogate key cannot be utilized as primary key. Eg: a sequential number can be a surrogate key. What is the purpose of Factless Fact Table? A tracking process or collecting status can be performed by using fact less fact tables. It does not have numeric values that are aggregate. What is a level of Granularity of a fact table? The granularity is the lowest level of information stored in the fact table. The depth of data level is known as granularity.
    • Eg:In date dimension the level could be year, month, quarter, period, week, day of granularity. The process consists of the following two steps: - Determining the dimensions that are to be included - Determining the location to place the hierarchy of each dimension of information Difference between star and snowflake schema. A snowflake schema is a more normalized form of a star schema. In a star schema, one fact table is stored with a number of dimension tables. In a star schema, one dimension table can have multiple sub dimensions. This means that in a star schema, the dimension table is independent without any sub dimensions. What is the difference between view and materialized view? View: • Tail raid data representation is provided by a view to access data from its table. • Has logical structure cannot occupy space. • Changes get affected in corresponding tables. Materialized view • Pre calculated data persists in it. • Has physical data space occupation. • Changes will not get affected in corresponding tables What is Linked Cube with reference to data warehouse? Linked cubes are the cubes that are linked in order to make the data remain constant. 1. What is the difference between OLAP and OLTP? 2. Tell me about your ETL workflow process? 3. What is the difference between Operational Database and Warehouse? 4. What type of approach you follow in your project? 5. What is the difference between Data Mart and data ware house? 6. In your project you are using which type of data base and how much space ? 7. Explain the test case template? 8. What is the difference between Severity and Priority?
    • 9. What is the difference between SDLC and STLC? 10. What is the difference between Issue Log and Clarification Log? 11. What type of bugs you have faced in your project? 12. What is Banking? 13. Explain what are the types of Banking? 14. What is the difference between Dimension table and Fact table? 15. Explain SCD’s and their types? how it will be used? 16. Explain Bug reporting? 17. Are you using any models in SDLC? 18. Which process used in ETL Testing? 19. What is unit testing? who will do this? 20. Whats the difference between Incremental Load and Initial Load? 21. Through which document you have done your project? 22. Are you using Requirement tab in QC? Types of Etl Bugs 1. User interface bugs/cosmetic bugs:- Related to GUI of application Navigation, spelling mistakes, font style, font size, colors, alignment. 2. BVA Related bug:- Minimum and maximum values 3. ECP Related bug:- Valid and invalid type 4. Input/output bugs:- Valid values not accepted Invalid values accepted 5. Calculation bugs:- Mathematical errors Final output is wrong 6. Load condition bugs:-
    • Does not allows multiple users Does not allows customer expected load 7. Race condition bugs:- System crash & hang System cannot run client plat forms 8. Version control bugs:- No logo matching No version information available This occurs usually in regression testing 9. H/W bugs:- Device is not responding to the application 10. Source bugs:- Mistakes in help documents Types of ETL Testing :- 1) Constraint Testing: In the phase of constraint testing, the test engineers identifies whether the data is mapped from source to target or not. The Test Engineer follows the below scenarios in ETL Testing process. a) NOT NULL b) UNIQUE c) Primary Key d) Foreign key e) Check f) Default g) NULL 2) Source to Target Count Testing: In the Source to Target data is matched or not. A Tester can check in this view whether it is ascending order or descending order it doesn’t matter .Only count is required for Tester. Due to lack of time a tester can follow this type of Testing. 3) Source to Target Data Validation Testing: In this Testing, a tester can validate the each and every point of the source to target data. Most of the financial projects, a tester can identify the decimal factors. 4) Threshold/Data Integrated Testing: In this Testing, the Ranges of the data, A test Engineer can usually identifies the population calculation and share marketing and business finance analysis (quarterly, halferly, Yearly) MIN MAX RANGE 4 10 6
    • 5) Field to Field Testing: In the field to field testing, a test engineer can identify that how much space is occupied in the database. The data is integrated in the table cum data types. NOTE: To check the order of the columns and source column to target column. 6) Duplicate Check Testing: In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that time the tester follows database queries why because huge amount of data is present in source and Target tables. Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL HAVING COUNT (*) >1; Note: 1) There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may arise. 2) Sometimes, a developer can do mistakes while transferring the data from source to target at that time duplicates may arise. 3) Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool). 7) Error/Exception Logical Testing: 1) Delimiter is available in Valid Tables 2) Delimiter is not available in invalid tables(Exception Tables) 8) Incremental and Historical Process Testing: In the Incremental data, the historical data is not corrupted. When the historical data is corrupted then this is the condition where bugs raise. 9) Control Columns and Defect Values Testing: This is introduced by IBM 10) Navigation Testing: Navigation Testing is the End user point of view testing. An end user cannot follow the friendly of the application that navigation is called as bad or poor Navigation. At the time of Testing, A tester can identify this type of navigation scenarios to avoid unnecessary navigation. 11) Initialization testing: A combination of hardware and software installed in platform is called the Initialization Testing 12) Transformation Testing: At the time of mapping from source table to target table, Transformation is not in mapping condition, then the Test Engineer raises bugs. 13) Regression Testing:
    • Code modification to fix a bug or to implement a new functionality which makes us to to find errors. These introduced errors are called regression. Identifying for regression effect is called regression testing. 14) Retesting: Re executing the failed test cases after fixing the bug. 15) System Integration Testing: Integration testing: After the completion of programming process. Developer can integrate the modules there are 3 models a) Top Down b) Bottom Up c) Hybrid Project Here I am taking emp table as example. For this I will write test scenarios and test cases, that means we are testing emp table. Check List or Test Scenarios:- 1. To validate the data in table (emp) 2. To validate the table structure. 3. To validate the null values of the table. 4. To validate the null values of very attribute. 5. To check the duplicate values of the table. 6. To check the duplicate values of each attribute of the table 7. To check the field value or space (length of the field size) 8. To check the constraints (foreign ,primary key) 9. To check the name of the employer who has not earned any commission 10. To check the all employers who are work in dept no (Account dept,sales dept) 11. To check the row count of each attribute. 12. To check the row count of the table. 13. To check the max salary from emp table. 14. To check the min salary from emp table.
    • http://etltestingguide.blogspot.com/p/sql.html What is the Difference between a ODS and Staging Area ODS :-Operational Data Store which contains data . ods comes after the staging area eg:- In our e.g lets consider that we have day level Granularity in the OLTP & Year level Granularity in the Data warehouse. If the business(manager) asks for week level Granularity then we have to go to the OLTP and summarize the day level to the week level which would be pain taking. So what we do is that we maintain week level Granularity in the ODS for the data, for abt 30 to 90 days. Note : Ods information would contain cleansed data only. ie after staging area Staging Area :- It comes after the ETL has finished. Staging Area consists of 1.Meta Data . 2.The work area where we apply our complex business rules. 3.Hold the data and do calculations. In other words we can say that its a temp work area. The full form of ODS is Operational Data Store.ODS is a layer between the source and target databases..ODS is used to store the recent data. Staging layer is also a layer between the source and target databases..Staging layer is used for cleansing purpose and store the data periodically. ODS (Operational Data Source) is the first point in the Datawarehouse. Its store the real time data of daily transactions as the first instance of Date. Staging Area, is the later part which comes after the ODS. Here the Data is cleansed and temporarily stored before loaded into the Datawarehouse. ODS is a Open Data Source where it contains real time data (because we should apply any changes on real time data right..!) so dump the real time data into ODS called Landing area later we get the data into staging area here is the place where we do all transformation.