IBM InfoSphere Information Analyzer is a tool used for data profiling, data quality assessment, analysis and monitoring. It has capabilities for column analysis, primary key analysis, foreign key analysis, and cross-domain analysis. It provides data quality assessment, monitoring and rule design. Features include advanced analysis and monitoring, integrated rules analysis, and support for heterogeneous data. It helps users understand data structure, relationships and quality.
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa
Now a day’s every second trillion of bytes of data is being generated by enterprises especially in internet.To achieve the best decision for business profits, access to that data in a well-situated and interactive way is always a dream of business executives and managers. Data warehouse is the only viable solution that can bring the dream into veracity. The enhancement of future endeavours to make decisions depends on the availability of correct information that is based on quality of data underlying. The quality data can only be produced by cleaning data prior to loading into data warehouse since the data collected from different sources will be dirty. Once the data have been pre-processed and cleansed then it produces accurate results on applying the data mining query. Therefore the accuracy of data is vital for well-formed and reliable decision making. In this paper, we propose a framework which implements robust data quality to ensure consistent and correct loading of data into data warehouses which ensures accurate and reliable data analysis, data mining and knowledge discovery.
What is ETL testing & how to enforce it in Data WharehouseBugRaptors
Bugraptors always remains up to date with latest technologies and ongoing trends in testing. Technology like ELT Testing bringing the great changes which arises the scope of testing by keeping in mind all the positive and negative scenarios.
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa
Now a day’s every second trillion of bytes of data is being generated by enterprises especially in internet.To achieve the best decision for business profits, access to that data in a well-situated and interactive way is always a dream of business executives and managers. Data warehouse is the only viable solution that can bring the dream into veracity. The enhancement of future endeavours to make decisions depends on the availability of correct information that is based on quality of data underlying. The quality data can only be produced by cleaning data prior to loading into data warehouse since the data collected from different sources will be dirty. Once the data have been pre-processed and cleansed then it produces accurate results on applying the data mining query. Therefore the accuracy of data is vital for well-formed and reliable decision making. In this paper, we propose a framework which implements robust data quality to ensure consistent and correct loading of data into data warehouses which ensures accurate and reliable data analysis, data mining and knowledge discovery.
What is ETL testing & how to enforce it in Data WharehouseBugRaptors
Bugraptors always remains up to date with latest technologies and ongoing trends in testing. Technology like ELT Testing bringing the great changes which arises the scope of testing by keeping in mind all the positive and negative scenarios.
Quality of data for business operations is considered to be a critical component of enterprise success. With the
exponential rise in ways and means by which data is generated and consumed, organizations are more and more
focusing on ensuring data quality.
The Prototype of Standalone Diagnostic Report Editor as a Proof-of-Concept for an Interoperable Implementation of Health Level Seven Clinical Document Architecture Standard (HL7 CDA) not Integrated with Electronic Health Record (EHR) System
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...TELKOMNIKA JOURNAL
Data warehouse is a collective entity of data from various data sources. Data are prone to several
complications and irregularities in data warehouse. Data cleaning service is non trivial activity to ensure
data quality. Data cleaning service involves identification of errors, removing them and improve the quality of
data. One of the common methods is duplicate elimination. This research focuses on the service of duplicate
elimination on local data. It initially surveys data quality focusing on quality problems, cleaning methodology,
involved stages and services within data warehouse environment. It also provides a comparison through
some experiments on local data with different cases, such as different spelling on different pronunciation,
misspellings, name abbreviation, honorific prefixes, common nicknames, splitted name and exact match. All
services are evaluated based on the proposed quality of service metrics such as performance, capability to
process the number of records, platform support, data heterogeneity, and price; so that in the future these
services are reliable to handle big data in data warehouse.
Data Warehouse Physical Design,Physical Data Model, Tablespaces, Integrity Constraints, ETL (Extract-Transform-Load) ,OLAP Server Architectures, MOLAP vs. ROLAP, Distributed Data Warehouse ,
Cognos Framework Manager is a metadata modeling tool.Cognos Framework Manager provides the metadata model development environment for Cognos 8.A model is a business presentation of the information from one or more data sources. The model provides a business presentation of the metadata.The model is packaged and published for report authors and query users
Live online IT Training with MaxOnlineTraining.com is an easy, effective way to maximize your skills without the travel.
Call us at For any queries, please contact:
+1 940 440 8084 / +91 953 383 7156 TODAY to join our Online IT Training course & find out how Max Online Training.com can help you embark on an exciting and lucrative IT career.
Visit www.maxonlinetraining.com
Data Verification In QA Department FinalWayne Yaddow
Data warehouse and ETL testing should be conducted according to a process and checklist. This presentation provides an overview of recommended methods.
IBM Cognos - IBM informations-integration för IBM Cognos användareIBM Sverige
Hur kan användare av IBM Cognos analys- och rapporteringsfunktioner känna 100% tillförsikt till den information de analyserar? De måste kunna se och få förklaringar till vad informationen betyder, var den kommer ifrån och vilken status den har. Lösningen på denna typ av krav, och fler därtill, är IBM InfoSphere Information Server, som är marknadens mest kompletta plattform för informationsintegration. Denna presentation hölls på IBM Cognos Performance 2010 av Mikael Sjöstedt, InfoSphere Specialist, IBM
Quality of data for business operations is considered to be a critical component of enterprise success. With the
exponential rise in ways and means by which data is generated and consumed, organizations are more and more
focusing on ensuring data quality.
The Prototype of Standalone Diagnostic Report Editor as a Proof-of-Concept for an Interoperable Implementation of Health Level Seven Clinical Document Architecture Standard (HL7 CDA) not Integrated with Electronic Health Record (EHR) System
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...TELKOMNIKA JOURNAL
Data warehouse is a collective entity of data from various data sources. Data are prone to several
complications and irregularities in data warehouse. Data cleaning service is non trivial activity to ensure
data quality. Data cleaning service involves identification of errors, removing them and improve the quality of
data. One of the common methods is duplicate elimination. This research focuses on the service of duplicate
elimination on local data. It initially surveys data quality focusing on quality problems, cleaning methodology,
involved stages and services within data warehouse environment. It also provides a comparison through
some experiments on local data with different cases, such as different spelling on different pronunciation,
misspellings, name abbreviation, honorific prefixes, common nicknames, splitted name and exact match. All
services are evaluated based on the proposed quality of service metrics such as performance, capability to
process the number of records, platform support, data heterogeneity, and price; so that in the future these
services are reliable to handle big data in data warehouse.
Data Warehouse Physical Design,Physical Data Model, Tablespaces, Integrity Constraints, ETL (Extract-Transform-Load) ,OLAP Server Architectures, MOLAP vs. ROLAP, Distributed Data Warehouse ,
Cognos Framework Manager is a metadata modeling tool.Cognos Framework Manager provides the metadata model development environment for Cognos 8.A model is a business presentation of the information from one or more data sources. The model provides a business presentation of the metadata.The model is packaged and published for report authors and query users
Live online IT Training with MaxOnlineTraining.com is an easy, effective way to maximize your skills without the travel.
Call us at For any queries, please contact:
+1 940 440 8084 / +91 953 383 7156 TODAY to join our Online IT Training course & find out how Max Online Training.com can help you embark on an exciting and lucrative IT career.
Visit www.maxonlinetraining.com
Data Verification In QA Department FinalWayne Yaddow
Data warehouse and ETL testing should be conducted according to a process and checklist. This presentation provides an overview of recommended methods.
IBM Cognos - IBM informations-integration för IBM Cognos användareIBM Sverige
Hur kan användare av IBM Cognos analys- och rapporteringsfunktioner känna 100% tillförsikt till den information de analyserar? De måste kunna se och få förklaringar till vad informationen betyder, var den kommer ifrån och vilken status den har. Lösningen på denna typ av krav, och fler därtill, är IBM InfoSphere Information Server, som är marknadens mest kompletta plattform för informationsintegration. Denna presentation hölls på IBM Cognos Performance 2010 av Mikael Sjöstedt, InfoSphere Specialist, IBM
Installation and Setup for IBM InfoSphere Streams V4.0lisanl
Laurie Williams is the Installation component lead on the InfoSphere Streams developement team. Her presentation describes the installation and setup of IBM InfoSphere Streams V4.0 in a multi-host environment.
View related presentations and recordings from the Streams V4.0 Developers Conference at:
https://developer.ibm.com/answers/questions/183353/ibm-infosphere-streams-40-developers-conference-on.html?smartspace=streamsdev
Data Profiling and Quality Assurance with Great Expectations.pptxKnoldus Inc.
In this session, we will learn how to utilize great expectations for robust data profiling and quality assurance. This open-source framework empowers data professionals to define, automate, and monitor data expectations, ensuring accurate and reliable insights from your datasets.
Data quality testing – a quick checklist to measure and improve data qualityJaveriaGauhar
Don't wait for a data migration event to test your data quality. Perform data quality tests now before it gets too late. Here's everything you need to know!
https://dataladder.com/data-quality-test-checklist/
Quality of data for business operations is considered to be a critical component of enterprise success. With the exponential rise in ways and means by which data is generated and consumed, organizations are more and more focusing on ensuring data quality. Studies indicate that fewer than 50% of IT decision makers have confidence in their organization’s data quality initiatives, although more than 90% acknowledge the growing importance and volumes of data that they have to grapple with in future.
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
Data Analytics has emerged has one of the central aspects of business operations. Consequently, the quest to grab professional positions within the Data Analytics domain has assumed unimaginable proportions. So if you too happen to be someone who is desirous of making through a Data Analyst .
Techniques for effective test data management in test automation.pptxKnoldus Inc.
Effective test data management in test automation involves strategies and practices to ensure that the right data is available at the right time for testing. This includes techniques such as data profiling, generation, masking, and documentation, all aimed at improving the accuracy and efficiency of automated testing processes.
Architecting the Framework for Compliance & Risk Managementjadams6
Privacy and protection of personal information is a hot topic in data governance. However, the compliance challenge is in creating audit defensibility that ensures practices are compliant and performed in a way that is scalable, transparent, and defensible; thus creating “Audit Resilience.” Data practitioners often struggle with viewing the world from the auditor’s perspective. This presentation focuses on how to create the foundational governance framework supporting a data control model required to produce clean audit findings. These capabilities are critical in a world where due diligence and compliance with best practices are critical in addressing the impacts of security and privacy breaches. The companies in the news recently drive home these points.
Microsoft Excel is a spreadsheet program used to record and analyse numerical and statistical data. Microsoft Excel provides multiple features to perform various operations like calculations, pivot tables, graph tools, macro programming, etc.
An Excel spreadsheet can be understood as a collection of columns and rows that form a table. Alphabetical letters are usually assigned to columns, and numbers are usually assigned to rows. The point where a column and a row meet is called a cell.
SPSS (Statistical Package for the Social Sciences) is a versatile and responsive program designed to undertake a range of statistical procedures. SPSS software is widely used in a range of disciplines and is available from all computer pools within the University of South Australia.
DOE is an essential tool to ensure products and processes satisfy Quality by Design requirements imposed by regulatory agencies. Using a QbD approach to develop your testing process can help you reduce waste, meet compliance criteria and get to market faster.
DOE helps you create a reliable QbD process for assessing formula robustness, determining critical quality attributes and predicting shelf life by using a few months of historical data.
Minitab is a statistics package developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in conjunction with Triola Statistics Company in 1972.
It began as a light version of OMNITAB 80, a statistical analysis program by NIST, which was conceived by Joseph Hilsenrath in years 1962-1964 as OMNITAB program for IBM 7090. The documentation for OMNITAB 80 was last published 1986, and there has been no significant development since then.
R is a language and environment for statistical computing and graphics."
"R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible."
"One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed.“
TOPICS:
1.INTRODUCTION
2.WHY USING CAAT TOOLS ?
3.Key capabilities of CAAT
4.The CAATs have the following key capabilities that have brought the paradigm shift in the field of auditing
5.Precautions in using CAATs
6.CAATs are very critical tools for auditors therefore in order to improve their effectiveness. it is important to ensure that they take the following precaution while using CAATs
7.TESTS PERFORMED USING CAATs
8.USING EXCEL AS A CAATs TOOL
Machine Learning in Autonomous Data WarehouseSandesh Rao
Machine Learning in Autonomous Data Warehouse: One can use Oracle Autonomous Data Warehouse for machine learning. There are several ways to do this. This presentation explores these different but related options for performing machine learning. Each of these options enables people with different backgrounds to engage with building machine learning solutions on their data. At the end of the session, you will know which option will work best for you
This is from the Bay area Cloud Computing event https://www.meetup.com/All-Things-Cloud-Computing-Bay-Area/events/271017950/
4DAlert data house platform is a sophisticated and user-friendly solution that enables efficient data management for any organization. Visit: https://medium.com/@nihar.rout_analytics/what-is-data-observability-ece66dcf0081
Introduction to Machine Learning and Data Science using Autonomous Database ...Sandesh Rao
This session will focus on basics of what Machine Learning is , different types of Machine Learning and Neural Networks , supervised and unsupervised machine learning , autoML for training models and this ends with an example of how to predict workloads using Average Active sessions and different algorithms as an example and also how to predict maintenance windows for your databases. We will also use different open source frameworks as well as some of the tools in the Autonomous Database cloud to do this. If you are a DBA and want to learn something about machine learning and use the tools to perform your tasks more efficiently and automaticall
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
Data Quality in Test Automation: Navigating the Path to Reliable Testing" delves into the crucial role of data quality within the realm of test automation. It explores strategies and methodologies for ensuring reliable testing outcomes by addressing challenges related to the accuracy, completeness, and consistency of test data. The discussion encompasses techniques for managing, validating, and optimizing data sets to enhance the effectiveness and efficiency of automated testing processes, ultimately fostering confidence in the reliability of software systems.
1. InfoSphere Information Analyzer
Data quality assessment, analysis and monitoring
Information Analyzer is an IBM product and a tool widely used for data profiling. Information Analyzer
helps in understanding data structure, format, relationships and quality monitoring. Information
Analyzer is also referred to as WebSphere Information Analyzer.
Information Analyzer has extensive data profiling capabilities. It is available with a user interface that
includes a set of controls designed for integrating development work flow. The four major data profiling
functions within Information Analyzer are:
Column Analysis: Generates a full-frequency distribution and examines column values to infer properties
and definitions such as statistical measures and domain values.
Primary Key Analysis: Identifies candidate keys for one or more tables and aids in testing column
combinations or columns to determine whether the candidate is suitable for forming a primary key
Foreign Key Analysis: Examines the relationships and contents across tables, thereby identifying foreign
key and referential check integrity.
Cross-Domain Analysis: Identifies overlap in values between columns and redundancy of data between
tables.
IBM InfoSphere Information Analyzer provides data quality assessment, data quality monitoring and
data rule design and analysis capabilities. This software helps you derive more meaning from your
enterprise data, reduces the risk of proliferating incorrect information, facilitates the delivery of trusted
content, and helps to lower data integration costs.
InfoSphere Information Analyzer features include:
• Advanced analysis and monitoring provides source system profiling and analysis capabilities to help
you classify and assess your data.
• Integrated rules analysis uses data quality rules for greater validation, trending and pattern analysis.
• Scalable, collaborative platform enables sharing of information and results across the enterprise.
• Support for heterogeneous data enables you to assess information over a wide range of systems
and data sources.
Methodology and best practices
IBM InfoSphere Information Analyzer to understand the content, structure, and overall quality of your
data at a given point in time.
2. The analysis methodology and best practices provides a deeper insight into the analytical methods
employed by IBM InfoSphere Information Analyzer to analyze source data and rules.
The information is organized by analytical function. It gives you both in-depth knowledge and best
practices for:
• Data analysis, including:
o Applying data analysis system functionality
o Applying data analysis techniques within a function
o Interpreting data analysis results
o Making decisions or taking actions based on analytical results
• Data quality analysis and monitoring, including:
o Supporting business-driven rule definition and organization
o Applying rules and reusing consistently across data sources
o Leveraging multi-level rule analysis to understand broader data quality issues
o Evaluating rules against defined benchmarks/thresholds
o Assessing and annotating data quality results
o Monitoring trends in data quality over time
o Deploying rules across environments
o Running adhoc, scheduled, or command line execution options
Analyzing data by using data rules
The topics in this section describe how to define and execute data rules, which evaluate or validate
specific conditions associated with your data sources. Data rules can be used to extend your data
profiling analysis, to test and evaluate data quality, or to improve your understanding of data integration
requirements.
To work with data rules, you will always start by going to the Develop navigator menu in the console,
and select Data Quality. This will get you to the starting point of creating and working with data rule
functionality.
From the Data Quality workspace you can:
• Create data rule definitions, rule set definitions, data rules, rule sets, and metrics
• Build data rule definition, rule set definition, and metric logic
• Create data rule definition and rule set definition associations
• Associate a data rule definition, rule set definition, metric, data rule, or rule set with folders
• Associate a data rule definition, rule set definition, metric, data rule, or rule set with IBM®
InfoSphere™ Business Glossary terms, policies, and contacts
• Build data rule definitions or rule set definitions by using the rule builder
• Add a data rule definition with the free form editor
Characteristics of data rule functionality
You can use data rules to evaluate and analyze conditions found during data profiling, to conduct a data
quality assessment, to provide more information to a data integration effort, or to establish a
framework for validating and measuring data quality over time.
3. You can construct data rules in a generic fashion through the use of rule definitions. These definitions
describe the rule evaluation or condition. By associating physical data sources to the definition, a data
rule can be run to return analysis statistics and detail results.
Creating a rule definition
Creating a rule definition requires two components: a name for the rule definition and a logical
statement (the rule logic) about what the rule definition tests or evaluates. Incomplete, empty, or
invalid data values affect the quality of the data in your project by interrupting data integration
processes and by using up memory on source systems. You can create rule definitions to analyze data
for completeness and validity to find these anomalies.
You can create a rule definition by defining the name and description of the rule, and by using the free
form editor or rule logic builder to complete the rule logic for the rule definition.
Procedure
1. From the Develop icon in the Navigator menu in the console, select Data Quality.
2. Click New Rule Definition in the Tasks list, located on the right side of the screen.
3. Enter a name for the new rule definition in the Name field.
4. Optional: Enter a brief description of your rule definition in the Short Description field.
5. Optional: Enter a longer description of your rule definition in the Long Description field.
6. Optional: In the Validity Benchmark section, check Include Benchmark to set benchmarks to
check the validity of your data.
7. Click Save.
Generating a data rule from a rule definition
After you create rule definition logic, you can create a data rule to analyze real data in your projects.
Procedure
1. From the Develop icon on the Navigator menu, select Data Quality.
2. Highlight the rule definition that you want to generate a data rule from.
3. In the Tasks menu on the right side of the screen, click Generate Data Rule or Rule Set.
4. On the Overview tab, type a name for the data rule. The name must contain at least one
character and cannot contain the slash () character. The name must be unique in your project.
5. Optional: Type a short description and long description of your data rule. The Created
By, Created On, and Last Modified fields are automatically populated after you create and save
your data rule. You can optionally provide information into the Owner and Data Stewardfields.
6. Decide if you would like to set a validity benchmark for your data rule. Benchmarks quantify the
quality of your data, as well as monitor your data. Click the Monitor Records Flagged by One or
4. More Rules check box in the Validity Benchmark box, if you would like to monitor records that
are marked for other rules in your project.
7. At the top of the workspace, switch from the Overview tab to the Bindings and Output tab.
8. Click Save to create the data rule.
Setting benchmarks for data rules
You can set a validity benchmark either when you initially create a rule definition, or when you generate
the data rule, in order to quantify the quality of your data, as well as monitor your data.
Validity benchmark
The validity benchmark establishes the level or tolerance you have for exceptions to the data rule. The
benchmark indicates whether sufficient records have met or not met the rule in order to mark a specific
execution of the rule as having passed or failed to meet the benchmark.
Select Monitor records that do not meet one or more rules in the data rule workspace.
You can define the validity benchmark by using the following options that can be found in the menu in
the validity benchmark workspace. Start by selecting one of the following options:
% Not Met
Determines the percentage of records that did not meet the rule logic in the data rule. You can
set the benchmark to display a pass or fail condition when this value is greater than, less than,
or equal to a reference value that you specify. For example, to ensure that the percentage of
records that do not meet a data rule never exceeds or falls below 10%, you would set the
benchmark to "% Not Met % <= 10."
# Not Met
Determines the number of records that did not meet the rule logic in your data rule. You can set
the benchmark to display a pass or fail condition when this value is greater than, less than, or
equal to a reference value that you specify. For example, to ensure that the percentage of
records that do not meet a data rule never exceeds or falls below 1000, you would set the
benchmark to "# Not Met <= 1000."
% Met
Determines the percentage of records that meet the rule logic in your data rule. You can set the
benchmark to display a pass or fail condition when this value is greater than, less than, or equal
to a reference value that you specify. For example, to ensure that the percentage of records that
meet the data rule never falls below 90%, you would set the benchmark to "Met % >= 90."
# Met
Determines the number of records that meet the rule logic in your data rule. You can set the
benchmark to display a pass or fail condition when this value is greater than, less than, or equal
5. to a reference value that you specify. For example, to ensure that the number of records that
meet the data rule never falls below 9000, you would set the benchmark to "Met # >= 9000."
Creating a rule set definition
To create a rules set definition, select two or more data rule definitions or data rules and add them to
the rule set. When a rule set is executed, the data will be evaluated based on the conditions of all rule
definitions and data rules included in the rule set.
A rule set definition allows you to define a series of data rule definitions as one combined rule
definition. After you define your rule set definition, you generate a rule set out of the rule set definition.
When you generate a rule set, you bind all of the variables from your data rule definitions, such as
"first_name" and "column_a," to actual data in your data sources, such as "Fred" or "division_codes."
Your representational rule set definition elements are generated as a rule set that is bound to real data.
Once your rule set is generated, you run the rule set to gather information on the data in your projects.
Procedure
1. From the Develop Navigator menu in the console, select Data Quality.
2. Click New Rule Set Definition in the Tasks list, located on the right side of the screen.
3. Enter a name for the new rule in the Name field.
4. Optional: Enter a brief description of your data rule in the Short Description field.
5. Optional: Enter a longer description of your data rule in the Long Description field.
6. Optional: Select any benchmarks that you want to set for the rule set definition. You can set
a Validity Benchmark, Confidence Benchmark, or Baseline Comparison Benchmark.
7. Click Save.
Creating a metric
You can create a metric, which is an equation that you define, in order to develop a measurement you
can apply against data rules, rule sets, and other metrics.
You can create metrics to establish a set of key performance indicators (KPI) around the data quality of
the sources that are being evaluated. You can use metrics to aggregate the results of multiple rules and
rule sets to provide you with a higher level of key performance indicators across multiple sources.
Procedure
1. From the Develop Navigator menu in the console, select Data Quality.
2. Click New Metric in the Tasks pane.
3. Required: In the Name field, type a name for the metric.
6. 4. Optional: Provide a short and long description.
5. Optional: In the Validity Benchmark section, select Include Benchmark to set benchmarks to
check the validity of your data.
6. To associate the metric with a folder:
a. Select the Folders view.
b. Click Add. You can search for folders by name and select folders to be associated with
the metric.
7. To develop the logic for the new metric, select from a variety of predefined metric combinations
to build logic for the metric:
a. Click the Measures tab.
b. Select an opening parenthesis if you are grouping lines of logic to form a single
condition.
c. Compose a metric expression that can include rules, rule sets, metric executables, and
functions from the Quality Control and Functions tabs on the Expression palette.
d. Select a closing parenthesis if you are grouping lines of logic to form a single condition.
e. Select a Boolean operator.
8. Save the new metric.
Advanced analysis and monitoring
• Enables users to easily classify data, display data using semantics, validate column/table
relationships and move to exception rows for further analysis.
• Provides data quality assessment functions such as column, primary key, foreign key, cross-domain
and baseline analysis, and offers 80 configurable reports for visualizing analysis and trends.
• Uses the IBM Information Server scheduling service to allow scheduled execution of profiling, rules
and metrics.
• Provides auditing, tracking and monitoring of data quality conditions over time to support data
governance initiatives.
• Uses project-, role- and user-based approaches to control access to sensitive information, including
the ability to restrict access to original data sources.
Integrated rules analysis
• Provides common data rules to perform trending, pattern analysis and establish baselines
consistently over data sources.
7. • Offers multiple-level rules analysis (by rule, record, pattern) for evaluating data issues by record
rather than in isolation.
• Provides pre-packaged data validation rules to reduce development time.
• Offers exception-based management of business rules and transformations.
Scalable, collaborative platform
• Provides native parallel execution for enterprise scalability to support large volumes of data.
• Supports multiple analytical reviews and asynchronous profiling to allow more than one user to
work in a project-based context.
• Uses virtual tables and columns for analyzing data without requiring changes to a host database.
• Provides annotations to enable users to add their business names, descriptions, business terms and
other attributes to tables, columns and rules.
Support for heterogeneous data
• Uses open database connectivity (ODBC) or native connectivity to profile IBM DB2, IBM Informix,
Oracle, Microsoft SQL Server, Sybase, Microsoft Access, Teradata and other data sources such as
text files.
• Allows reuse and sharing of data rules in IBM InfoSphere DataStage through IBM InfoSphere
QualityStage and InfoSphere Information Analyzer to help you align data quality metrics throughout
the project lifecycle.
• Uses metadata to allow analytical results to be shared across all IBM InfoSphere Information Server
modules.
• Integrates with IBM InfoSphere Metadata Workbench and IBM InfoSphere Business Glossary.
• Integrates with IBM InfoSphere Information Analyzer for Linux on System z®, allowing you to
perform data quality functions directly on the mainframe.