SlideShare a Scribd company logo
1 of 18
®
IBM Software Group
© IBM Corporation
IBM Information Server
Understand - Information Analyzer
IBM Software Group
IBM Information Server
Delivering information you can trust
Understand Cleanse Transform Deliver
Discover, model, and
govern information
structure and content
Standardize, merge,
and correct information
Combine and restructure
information for new uses
Synchronize, virtualize and
move information for in-line
delivery
Parallel
Processing Connectivity Metadata DeploymentAdministration
Platform Services
Support for Service-Oriented Architectures
2
IBM Software Group
3
The IBM Solution: IBM Information Server
Delivering information you can trust
Cleanse Transform Deliver
Parallel Processing
Rich Connectivity to Applications, Data, and Content
IBM Information Server
Unified Deployment
Unified Metadata Management
Understand
Information Analyzer
Data profiling for understanding what data you
have and how it relates to other data, plus data
analysis for measuring and monitoring ongoing
data quality.
IBM Software Group
4
Data Profiling Critical Problems:
 You don’t know what data is really in your
legacy systems
 Sources have changed or are new and
unknown
Why?
 Data values and relationships are
inconsistent and divergent from
documented rules
 Incomplete and missing documentation
 Data sources are never static and
frequently change without warning
Alternative Approach
 Labor intensive, resource devouring
process
 Never review 100% of data elements
 No infrastructure to support maintenance
 No standardized approach across
projects
 1st generation tools document but don’t
address the problem resolution
Mainframe manufacturing system
Demographic
Contact
Billing / Accounts
External Lists
Distribution
ERP from acquisition
Parts BOM
Data Sources
IBM Software Group
5
About Information Analyzer
 Automates your data discovery
process
 Enables you to understand your
data before starting development
 Eliminates the risk and
uncertainty of using bad data
 Useful in any type of data
migration project
 Analyzes every data attribute
and reverse engineers the true
meta data of your source
 Reduces time to analyze data
Mainframe manufacturing system
Demographic
Contact
Billing / Accounts
External Lists
Distribution
ERP from acquisition
Parts BOM
Data Sources
IBM Software Group
6
IBM Information Analyzer
 Reduce Time to Value
of Data Projects
 Increase the
Productivity of Data
Personnel
 Assess Data Quality &
Consistency across the
Enterprise
 Results sharable
across IBM Information
Server
Data Profiling: the process of analyzing a data sources to determine its content,
quality and structure
IBM Software Group
What does Information Analyzer provide?
 Source System Analysis
Provides the key understanding of the source data
 Column & Domain analysis
 Table/Primary Key analysis
 Foreign Key analysis
 Cross-Domain analysis
 Iterative Analysis
Leverages the analysis to facilitate iterative tests
 Baseline analysis
7
Foreign Key &
Cross-Domain Analysis
Primary
Key Analysis
Column
Analysis
Source 1 Source
2
IBM Software Group
Source System Analysis
 Column & Domain analysis
Infers from content a column’s classification, physical properties,
and frequency distribution
 Table/Primary Key analysis
Validates the uniqueness of the identified key column, which allows
us to ensure that a given row of data can be clearly identified and
related to other data
 Cross-Domain & Foreign Key analysis
Syncronizes the structure, relationships and integrity of data
environments by finding and validating otherwise unknown
relationships and identifying critical integrity violations that need to be
rectified.
8
IBM Software Group
Column Analysis: Tabular View
9
IBM Software Group
Column Analysis: Chart View
 Frequency Distribution
View Frequency Distribution either in Tabular or in Graph
Add user defined value to Frequency Distribution
Generate Reference Tables
Sort and Filter Frequency Data
10
IBM Software Group
Column Analysis: Properties
 Properties
Six property values are inferred for each column: Data Type,
Length, Precision, Scale, Nullability and Cardinality Type.
Distribution of data types, lengths, precisions and scales is
displayed graphical.
11
IBM Software Group
Primary Key Analysis Results
 Reviewing Duplicates
View Summary of Distinct and
Duplicated Values
Display list of all Primary Key
values and #/% Duplicated.
12
IBM Software Group
Cross Domain Analysis Results
13
IBM Software Group
14
 Baseline Differences
Detailed results for
the column level.
Results include the
column level
summaries of
distinctions for both
Structure (Defined
and Inferred) and
Content.
Baseline Analysis
IBM Software Group
Sharing Analysis across Information Server
15
IBM Software Group
16
Company Facts :
• Largest distributor in North America
• Four major acquisitions in last two
years
• 12,000 branded products
• 30,000 clients
• 11 operating centers
Integration of supply
chain management
systems
Profit margin analysis
systems
Field expansion, and take
along project
Staff changes and limited
documentation related to
acquired systems
Only 7% of data being
analyzed, but bad data
causing 20% of cost overruns
Estimate 10k hours and $650k
in costs to support first four
projects
80% productivity gain for
analyzing data sources
$504,000 annual savings in
lower development and
maintenance costs
Repeatable process for all
future projects that ensures
good, actionable data
Project Goals Challenges Results
ROI: Food Distribution.
IBM Software Group
17
ROI: Top US Life Insurance Company
Competitive pressures
requires the company to
further enhance an existing
competitive advantage – 360
degree customer view and
247 data availability.
.
Detailed customer data
resided in ten disparate
legacy systems with little
to no documentation.
Presenting raw detailed
data 247 was impossible.
Leveraging IBM allows for
consistent data formats,
validate data domains,
define business rules
linking policy data.
Better customer visibility.
Reduced costs by
eliminating expensive and
time-consuming
investigations of detailed
data.
Redeploying an investigator
saves $130k annually.
Project Goals Challenges Results
Company Facts :
• #1 Largest Life Insurance Company in USA
• 138US$ billion in assets under management
• Offer complete like of life insurance,
investment, retirement and related products
®
IBM Software Group
© IBM Corporation
Thank You

More Related Content

What's hot

Data Quality Technical Architecture
Data Quality Technical ArchitectureData Quality Technical Architecture
Data Quality Technical Architecture
Harshendu Desai
 
Data Analysis using Data Flux
Data Analysis using Data FluxData Analysis using Data Flux
Data Analysis using Data Flux
Sunil Pai
 

What's hot (20)

Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large Applications
 
Data Quality Technical Architecture
Data Quality Technical ArchitectureData Quality Technical Architecture
Data Quality Technical Architecture
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
ETL QA
ETL QAETL QA
ETL QA
 
Etl testing
Etl testingEtl testing
Etl testing
 
Data base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somyaData base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somya
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
 
iEHR.eu IHIC 2012 Paper
iEHR.eu IHIC 2012 PaperiEHR.eu IHIC 2012 Paper
iEHR.eu IHIC 2012 Paper
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework manager
 
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
 
Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department Final
 
Optim Archive
Optim ArchiveOptim Archive
Optim Archive
 
Editing, cleaning and coding of data in Business research methodology
Editing, cleaning and coding of data in Business research methodology Editing, cleaning and coding of data in Business research methodology
Editing, cleaning and coding of data in Business research methodology
 
ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and Answers
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...
 
Sas dataflux management studio Training ,data flux corporate trainig
Sas dataflux management studio Training ,data flux corporate trainig Sas dataflux management studio Training ,data flux corporate trainig
Sas dataflux management studio Training ,data flux corporate trainig
 
Data Analysis using Data Flux
Data Analysis using Data FluxData Analysis using Data Flux
Data Analysis using Data Flux
 
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAININGSAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
 

Similar to IS_02.Understand

Data quality and bi
Data quality and biData quality and bi
Data quality and bi
jeffd00
 
ERP [Compatibility Mode]
ERP [Compatibility Mode]ERP [Compatibility Mode]
ERP [Compatibility Mode]
Gunjan Mehta
 
t2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevityt2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevity
Jonathan Hamilton Solórzano
 
Lecture 09 dblc centralized vs decentralized design
Lecture 09   dblc centralized vs decentralized designLecture 09   dblc centralized vs decentralized design
Lecture 09 dblc centralized vs decentralized design
emailharmeet
 
Lecture 09 dblc centralized vs decentralized design
Lecture 09   dblc centralized vs decentralized designLecture 09   dblc centralized vs decentralized design
Lecture 09 dblc centralized vs decentralized design
emailharmeet
 
CDI-MDMSummit.290213824
CDI-MDMSummit.290213824CDI-MDMSummit.290213824
CDI-MDMSummit.290213824
ypai
 

Similar to IS_02.Understand (20)

Using the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceabilityUsing the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceability
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
ERP [Compatibility Mode]
ERP [Compatibility Mode]ERP [Compatibility Mode]
ERP [Compatibility Mode]
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptx
 
data base management system (DBMS)
data base management system (DBMS)data base management system (DBMS)
data base management system (DBMS)
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 
Erp (Enterprise Resource Planning)
Erp (Enterprise Resource Planning)Erp (Enterprise Resource Planning)
Erp (Enterprise Resource Planning)
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
t2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevityt2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevity
 
Data analytics and audit coverage guide
Data analytics and audit coverage guideData analytics and audit coverage guide
Data analytics and audit coverage guide
 
Data analytics and audit coverage guide
Data analytics and audit coverage guideData analytics and audit coverage guide
Data analytics and audit coverage guide
 
Lecture 09 dblc centralized vs decentralized design
Lecture 09   dblc centralized vs decentralized designLecture 09   dblc centralized vs decentralized design
Lecture 09 dblc centralized vs decentralized design
 
Lecture 09 dblc centralized vs decentralized design
Lecture 09   dblc centralized vs decentralized designLecture 09   dblc centralized vs decentralized design
Lecture 09 dblc centralized vs decentralized design
 
CDI-MDMSummit.290213824
CDI-MDMSummit.290213824CDI-MDMSummit.290213824
CDI-MDMSummit.290213824
 
1_BPIO_University_Business_Intelligence.pptx
1_BPIO_University_Business_Intelligence.pptx1_BPIO_University_Business_Intelligence.pptx
1_BPIO_University_Business_Intelligence.pptx
 
VTU - MIS Module 4 - SDLC
VTU - MIS Module 4 - SDLCVTU - MIS Module 4 - SDLC
VTU - MIS Module 4 - SDLC
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
 
Data models
Data modelsData models
Data models
 

IS_02.Understand

  • 1. ® IBM Software Group © IBM Corporation IBM Information Server Understand - Information Analyzer
  • 2. IBM Software Group IBM Information Server Delivering information you can trust Understand Cleanse Transform Deliver Discover, model, and govern information structure and content Standardize, merge, and correct information Combine and restructure information for new uses Synchronize, virtualize and move information for in-line delivery Parallel Processing Connectivity Metadata DeploymentAdministration Platform Services Support for Service-Oriented Architectures 2
  • 3. IBM Software Group 3 The IBM Solution: IBM Information Server Delivering information you can trust Cleanse Transform Deliver Parallel Processing Rich Connectivity to Applications, Data, and Content IBM Information Server Unified Deployment Unified Metadata Management Understand Information Analyzer Data profiling for understanding what data you have and how it relates to other data, plus data analysis for measuring and monitoring ongoing data quality.
  • 4. IBM Software Group 4 Data Profiling Critical Problems:  You don’t know what data is really in your legacy systems  Sources have changed or are new and unknown Why?  Data values and relationships are inconsistent and divergent from documented rules  Incomplete and missing documentation  Data sources are never static and frequently change without warning Alternative Approach  Labor intensive, resource devouring process  Never review 100% of data elements  No infrastructure to support maintenance  No standardized approach across projects  1st generation tools document but don’t address the problem resolution Mainframe manufacturing system Demographic Contact Billing / Accounts External Lists Distribution ERP from acquisition Parts BOM Data Sources
  • 5. IBM Software Group 5 About Information Analyzer  Automates your data discovery process  Enables you to understand your data before starting development  Eliminates the risk and uncertainty of using bad data  Useful in any type of data migration project  Analyzes every data attribute and reverse engineers the true meta data of your source  Reduces time to analyze data Mainframe manufacturing system Demographic Contact Billing / Accounts External Lists Distribution ERP from acquisition Parts BOM Data Sources
  • 6. IBM Software Group 6 IBM Information Analyzer  Reduce Time to Value of Data Projects  Increase the Productivity of Data Personnel  Assess Data Quality & Consistency across the Enterprise  Results sharable across IBM Information Server Data Profiling: the process of analyzing a data sources to determine its content, quality and structure
  • 7. IBM Software Group What does Information Analyzer provide?  Source System Analysis Provides the key understanding of the source data  Column & Domain analysis  Table/Primary Key analysis  Foreign Key analysis  Cross-Domain analysis  Iterative Analysis Leverages the analysis to facilitate iterative tests  Baseline analysis 7 Foreign Key & Cross-Domain Analysis Primary Key Analysis Column Analysis Source 1 Source 2
  • 8. IBM Software Group Source System Analysis  Column & Domain analysis Infers from content a column’s classification, physical properties, and frequency distribution  Table/Primary Key analysis Validates the uniqueness of the identified key column, which allows us to ensure that a given row of data can be clearly identified and related to other data  Cross-Domain & Foreign Key analysis Syncronizes the structure, relationships and integrity of data environments by finding and validating otherwise unknown relationships and identifying critical integrity violations that need to be rectified. 8
  • 9. IBM Software Group Column Analysis: Tabular View 9
  • 10. IBM Software Group Column Analysis: Chart View  Frequency Distribution View Frequency Distribution either in Tabular or in Graph Add user defined value to Frequency Distribution Generate Reference Tables Sort and Filter Frequency Data 10
  • 11. IBM Software Group Column Analysis: Properties  Properties Six property values are inferred for each column: Data Type, Length, Precision, Scale, Nullability and Cardinality Type. Distribution of data types, lengths, precisions and scales is displayed graphical. 11
  • 12. IBM Software Group Primary Key Analysis Results  Reviewing Duplicates View Summary of Distinct and Duplicated Values Display list of all Primary Key values and #/% Duplicated. 12
  • 13. IBM Software Group Cross Domain Analysis Results 13
  • 14. IBM Software Group 14  Baseline Differences Detailed results for the column level. Results include the column level summaries of distinctions for both Structure (Defined and Inferred) and Content. Baseline Analysis
  • 15. IBM Software Group Sharing Analysis across Information Server 15
  • 16. IBM Software Group 16 Company Facts : • Largest distributor in North America • Four major acquisitions in last two years • 12,000 branded products • 30,000 clients • 11 operating centers Integration of supply chain management systems Profit margin analysis systems Field expansion, and take along project Staff changes and limited documentation related to acquired systems Only 7% of data being analyzed, but bad data causing 20% of cost overruns Estimate 10k hours and $650k in costs to support first four projects 80% productivity gain for analyzing data sources $504,000 annual savings in lower development and maintenance costs Repeatable process for all future projects that ensures good, actionable data Project Goals Challenges Results ROI: Food Distribution.
  • 17. IBM Software Group 17 ROI: Top US Life Insurance Company Competitive pressures requires the company to further enhance an existing competitive advantage – 360 degree customer view and 247 data availability. . Detailed customer data resided in ten disparate legacy systems with little to no documentation. Presenting raw detailed data 247 was impossible. Leveraging IBM allows for consistent data formats, validate data domains, define business rules linking policy data. Better customer visibility. Reduced costs by eliminating expensive and time-consuming investigations of detailed data. Redeploying an investigator saves $130k annually. Project Goals Challenges Results Company Facts : • #1 Largest Life Insurance Company in USA • 138US$ billion in assets under management • Offer complete like of life insurance, investment, retirement and related products
  • 18. ® IBM Software Group © IBM Corporation Thank You

Editor's Notes

  1. Key Point: The culmination of these efforts has led us to our latest platform offering – the IBM Information Server. IBM Information Server is a revolutionary new software platform from IBM that helps organizations derive more value from the complex, heterogeneous information spread across their systems. It enables organizations to integrate disparate data and deliver trusted information wherever and whenever needed, in line and in context, to specific people, applications, and processes. IBM Information Server helps business and IT personnel to collaborate to understand the meaning, structure, and content of any type of information across any sources. It provides breakthrough productivity and performance for cleansing, transforming, and moving this information consistently and securely throughout the enterprise, so it can be accessed and used in new ways to drive innovation, increase operational efficiency, and lower risk. IBM Information Server is designed to help companies leverage their information across all its sources. IBM Information Server delivers all of the functions required to integrate, enrich and deliver information you can trust for your key business initiatives. IBM Information Server allows you to: Understand all sources of information within the business, analyzing its usage, quality, and relationships Cleanse it to assure its quality and consistency Transform it to provide enriched and tailored information, and; Federate it to make it accessible to people, processes, and applications All of these functions are based on a parallel processing infrastructure that provides leverage and automation across the platform. The Information Server also provides connectivity to nearly any data or content source, and the ability to deliver information through a variety of mechanisms. Underlying these functions is a unified metadata management foundation that provides seamless sharing of knowledge throughout a project lifecycle, along with a detailed understanding of what information means, where it came from, and how it is related to information in other systems. Integration logic built within IBM Information Server can easily be deployed and managed as a shared service within a SOA. IBM Information Server provides: access to the broadest range of information sources the broadest range of integration functionality, including federation, ETL, in-line transformation, replication, and event publishing the most flexibility in how these functions are used, including support for service-oriented architectures, event-driven processing, scheduled batch processing, and even standard APIs like SQL and Java. The breadth and flexibility of the platform enable it to address many types of business problems and meet the requirements of many types of projects. This optimizes the opportunities for reuse, leading to faster project cycles, better information consistency, and stronger information governance. Regarding Service-Oriented Architectures, information integration enables information to be made available as a service, publishing consistent, reusable services for information that make it easier for processes to get the information they need from across a heterogeneous landscape.
  2. Key Point: The culmination of these efforts has led us to our latest platform offering – the IBM Information Server. IBM Information Server is a revolutionary new software platform from IBM that helps organizations derive more value from the complex, heterogeneous information spread across their systems. It enables organizations to integrate disparate data and deliver trusted information wherever and whenever needed, in line and in context, to specific people, applications, and processes. IBM Information Server helps business and IT personnel to collaborate to understand the meaning, structure, and content of any type of information across any sources. It provides breakthrough productivity and performance for cleansing, transforming, and moving this information consistently and securely throughout the enterprise, so it can be accessed and used in new ways to drive innovation, increase operational efficiency, and lower risk. IBM Information Server is designed to help companies leverage their information across all its sources. IBM Information Server delivers all of the functions required to integrate, enrich and deliver information you can trust for your key business initiatives. IBM Information Server allows you to: Understand all sources of information within the business, analyzing its usage, quality, and relationships Cleanse it to assure its quality and consistency Transform it to provide enriched and tailored information, and; Federate it to make it accessible to people, processes, and applications All of these functions are based on a parallel processing infrastructure that provides leverage and automation across the platform. The Information Server also provides connectivity to nearly any data or content source, and the ability to deliver information through a variety of mechanisms. Underlying these functions is a unified metadata management foundation that provides seamless sharing of knowledge throughout a project lifecycle, along with a detailed understanding of what information means, where it came from, and how it is related to information in other systems. Integration logic built within IBM Information Server can easily be deployed and managed as a shared service within a SOA. IBM Information Server provides: access to the broadest range of information sources the broadest range of integration functionality, including federation, ETL, in-line transformation, replication, and event publishing the most flexibility in how these functions are used, including support for service-oriented architectures, event-driven processing, scheduled batch processing, and even standard APIs like SQL and Java. The breadth and flexibility of the platform enable it to address many types of business problems and meet the requirements of many types of projects. This optimizes the opportunities for reuse, leading to faster project cycles, better information consistency, and stronger information governance. Regarding Service-Oriented Architectures, information integration enables information to be made available as a service, publishing consistent, reusable services for information that make it easier for processes to get the information they need from across a heterogeneous landscape.
  3. Data Discovery is about reducing project costs and risk by discovering problems in the design stage of your process. In many legacy systems and enterprise applications, meta data, field usage and general knowledge has changed over time. The data may be perfectly acceptable for whatever purpose it was designed for, but its often not until you load it into another application that you discover how inappropriate it is for what you want to do. Companies today are moving toward greater integration. As they try and become more customer centric or real time, management realizes that data must be treated as a corporate asset and not a division or business unit tool. Unfortunately, many sources will not be in the right form or have the right meta data or even documentation to allow a quick integration for other uses.
  4. ProfileStage will automatically scan a sample of your data to determine the quality and structure of the data. When you have made the adjustments you need, the product will then process all of the data to create a data model and an ETL job to extract the information is a form that is useable. Corporate data degrades very quickly. Particularly if you are trying to maximize its use across applications. ProfileStage enables you to discover what data you actually have so that you reduce the cost and time in developing new processes and enables you to leverage the value of legacy systems in new and valuable ways.. ProfileStage automates the data discovery process. Because you can analyze all of your data, you can eliminate the risk and cost of using inappropriate data, before you begin your development process. ProfileStage can be used in any type of integration process and we have found that it reduces the time to analyze data by 70%. This is very important because, many organizations will not do a complete data check because they fear missing their deadlines for project completion. In fact, the reverse is true. By taking the time to analyze the data our clients discover the valuable links and relationships in their data before development is wasted. And they are able to utilize legacy data in more projects because they have a methodology that ensures its proper use.
  5. Reduce Time to Value of Data Projects Fully understanding data content across the enterprise and throughout the development lifecycle 30% cost savings of project life cycle Analyst studies have shown that over 75% of data integration projects either overrun or fail. One of the main contributors to this is the assumption that the data required for the new application is actually available from the source and it’s in the form the user thinks it’s in, in short, the lack of understanding of the metadata can seriously compromise a data integration project. “…a defect that isn’t detected upstream (during requirements or design) will cost from 10 to 100 times as much to fix Downstream.” Steve McConnell, RAPID DEVELOPMENT, Microsoft Press Increase the Productivity of Data Personnel Data Personnel = Business and Data Analysts, Data Quality Specialists, Data Architects and Data Stewards Intuitive task based user interface Maintain Data Quality Consistency across the Enterprise via an intuitive, visual, ongoing display of pertinent data metrics Monitoring to ensure data quality throughout the lifecycle and not just during initial implementation Integration with information integration tools Results can be handed to DataStage or Quality Stage
  6. So what does it do ?
  7. – Single Column Key Duplicates
  8. – View Analysis Details Reviewing Cross Domain Analysis, View Analysis Details Shows a detailed comparison of the paired column to the base column Can view comparison of domain values (frequency distributions) Can mark the paired column as redundant with the base column
  9. – View Baseline Analysis
  10. Headquartered in Grand Rapids, Michigan, this company is the largest food distributor in the US. As part of there aggressive “growth through acquisition” strategy, the company has complete or is in the process of acquiring 4 companies over the last 2 years. This has expanded their geographic coverage to include most of continental US and Canada, but also created massive financial management issues. Information regarding 12,000 branded products; 30,000 clients; and 11 operating centers needed to be consolidated into single set of purchasing, inventory/fleet management, CRM and financial reporting systems. Initial efforts to consolidate acquired companies focused on integration of supply chain management systems, profit margin analysis systems, field expansion, and take along project. These projects consistently ran over budget and time. The single greatest problem was the lack of documentation and staff changes relative to legacy systems in acquired companies. Even when analyzing only 7% of source data, source data discovery problems were driving 20% of all cost overruns for all IT projects. Without profiling tools, they anticipated needing to spend 10K hours, $.65M in costs, to support 4 initial projects. In Q2 2003, they began using ProfileStage. It is helping them initially profile source systems more quickly; create ETL jobs the first time around that correct address source data issues/structure; find and exploit more and more powerful relationships in source data structures; and ultimately improve use/value from final integrated data through better understanding of the source information. On an ongoing basis they plan to use the metadata describing the sources and ETL jobs to more quickly extend or modify the solution. They are seeing a 80% productivity gain through use of profiling solution over manual source system discovery, which they expect will result in $504K annual savings in lower development and maintenance costs.
  11. Problem: New York Life executives implemented and employed two competitive advantages – a 360-degree view of their customers and data accessible 24\7 via their intranet. The executives plan to amplify these two competitive advantages to reduce costs, maximize profits, and enhance market share. While New York Life can identify individual policyholders owning multiple policies, detailed data is not readily available. The detail data exists in eight to ten disparate legacy systems, each supporting different departments and represented by different (poorly documented) codes and formats. To effectively manage, cross-sell, and up-sell New York Life needs to integrate and standardize the policyholder data. Moreover addressing competitive pressures necessitates New York Life expose their data 24\7 to executive management, brokers, and agents via the intranet. Their poorly documented data could not be exposed to the intranet without inflicting more harm than effecting good. Solution: To accomplish a 360-degree view of the customer data New York Life is using IBM to build an Operational Data Store (ODS) and Corporate Data Warehouse (CDW). Meeting the business-reporting requirements has necessitated exposing data to the intranet on a 24\7 basis. While building the ODS and CDW, IBM has enabled New York Life to establish consistent data formats; document suitable business codes across policy applications; validate business data domains; identify empty columns; define business rules linking policy data and more. Benefits: Employing IBM established a complete understanding the data and provided a 360-degree view of the customers. Having the improved 360-customer view reduces costs by eliminating expensive and time-consuming investigations of detailed data within legacy systems. The success of the CDW with accurate, complete, and timely data has begot more success. Departments within New York Life have flocked to the application as word of the usefulness and timeliness of the CDW data has spread. More people (i.e., agents, brokers, executive management) using the CDW have reduced costs and increased productivity. Costs are reduced because IBM has enabled New York Life to identify (and thus eliminate) disparate and incoherent data. Now multiple departments may leverage a single resource, the CDW. Productivity has increased, via IBM usage, because data within CDW is correct and end users (i.e., executive management) can focus upon leveraging the CDW rather than validating the CDW data. Validating detailed data with the ten disparate legacy systems had required approximately one person per legacy system. Since the successful implementation of CDW these individuals focus on new, more productive assignments. New York Life no longer spends the approximate $130,000 fully loaded cost of an employee to validate and cross-reference the detailed data. The CDW houses the necessary data.