Developing A Universal Approach to Cleansing Customer and Product Data
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Developing A Universal Approach to Cleansing Customer and Product Data

on

  • 3,314 views

Take a look at this review of current industry problems concerning data quality, and learn more about how companies are addressing quality problems with customer, product, and other types of corporate ...

Take a look at this review of current industry problems concerning data quality, and learn more about how companies are addressing quality problems with customer, product, and other types of corporate data. Read about products and use cases from SAP to see how vendors are supporting data cleansing.

Statistics

Views

Total Views
3,314
Views on SlideShare
3,314
Embed Views
0

Actions

Likes
2
Downloads
164
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Developing A Universal Approach to Cleansing Customer and Product Data Document Transcript

  • 1. SAP Thought Leadership Information Management Developing a Universal approach to cleansing cUstomer anD proDUct Data
  • 2. content 4 Executive Summary 5 Data Quality: What’s The Problem? 5 What Is Data Quality? 5 Measuring Data Quality 6 Components of a Data Quality Program 7 A Business Process Perspective to Data Quality 7 Data Quality Problems by Business Entity 8 The Role of Master Data 8 Types of Data Involved in Managing Data Quality 9 Cleansing Customer and Product Data 9 Product Data Example 10 Supporting Standards 10 Supporting National Languages 10 The Issue of Increasing Complexity 11 A Universal Approach to Data Quality Management Is Required 12 SAP BusinessObjects Information Management Solutions 12 Data Integration 12 Data Quality 12 Metadata Management 12 SAP BusinessObjects Data Insight 12 SAP BusinessObjects Data Quality Management 13 SAP BusinessObjects Universal Data Cleanse 14 Product Data 14 Multinational Data 15 Summary 15 For More Information About the Author Colin White, BI Research; BI Research is a research and consulting company whose goal is to help companies understand and exploit new developments in business intelligence and business integration. When combined, business intelli- e gence and business integration enable an organization to become a smart and agile business. For more information, visit www.bi-research.com.
  • 3. eXecUtive sUmmarY Data quality has always been an impor- are addressing quality problems with why companies need to take an enter- tant issue for companies, and this is customer, product, and other types of prise-wide approach to data cleansing even more the case today. Business corporate data. It explains why data and data quality management. Products and legislative pressures coupled with quality projects often evolve into mas- and use cases from SAP are used in the explosion in the amount of data ter data management projects, and the paper to demonstrate how vendors created by organizations is leading to discusses how master data can be are supporting data cleansing, and to increased corporate attention on extracted from unstructured business illustrate how customers are using such improving the quality and accuracy content and integrated into traditional functionality. of business data. corporate data systems. This paper reviews current industry The paper looks at the role of data problems concerning data quality, and cleansing tools in helping improve data takes a detailed look at how companies consistency and accuracy, and explains 4 SAP Thought Leadership – Information Management
  • 4. Data QUalitY: What’s the proBlem? THE COST OF POOR DATA QUALITy There has been a steady stream of arti- Although nobody disputes that poor 100% safe. The degree to which a cus- cles in the press about the impact of data quality has a significant financial tomer can accept a less than perfect or poor data quality on the business ever impact on companies, the figures in safe product will depend on the type of since companies began building data these articles are so variable and so customer buying the product, and on warehouses and examining the quality imprecise that there is a risk of reader how the product will be used. The qual- of data flowing into them. Here are fatigue when discussing data quality ity and safety of a product, however, is some early examples: issues at this superficial level of detail. also dependent on how much money Maybe this is why so many reports a manufacturer is willing to spend to “The survey showed that estimates of show that a high percentage of organi- develop and produce the product, poor information quality costs vary zations feel they have a data quality which in turn is dependent on how from one to 20% of total company rev- problem, while at the same time indicat- much money a customer is willing to enues. This disparity in cost estimation ing that an equally large number of orga- pay for it. All of these factors apply existed not only from company to com- nizations are also doing little about it. equally to providing and supporting pany but among senior information data quality. technology managers within the same The problem is that “data quality” organization.”1 is a vague term that means different Measuring Data Quality things to different people. Before “The Data Warehousing Institute data quality problems can be dis- While researching ISO 8402, I came (TDWI) estimates that poor quality cus- cussed in detail, data quality must across an excellent 1994 paper by tomer data costs U.S. businesses a be defined more precisely. Robert Walker, chairperson of the staggering $611 billion a year in post- Association for Geographic Information age, printing, and staff overhead.”2 What Is Data Quality? (AGI) Standards Data Quality Working Group. He had this to say about the More recent articles show the situation A useful definition of quality can be quality of data: has not improved: found in the ISO 8402 specification, which defines quality as, “The totality “Clearly, quality issues vary depending “Poor data quality costs the typical of characteristics of an entity that bear on the type of data and the purpose to company at least 10% of revenue; on its ability to satisfy stated or implied which it is put. The criteria that are 20% is probably a better estimate.”3 needs.” If we apply this definition to widely used for assessing data quality data, then it means that data quality is are as follows: “Companies are investing billions of dependent not only the characteristics • Lineage: This covers the ancestry of dollars in CRM applications and data of the data itself, but also on the busi- the dataset. It describes the source integration projects to gain a better ness context in which it is used (that is, material from which the data was view of their customers – only to dis- on the business processes and busi- derived and the methods of deriva- cover that conflicting data makes them ness users that employ it). tion. The description of the source blind. Gartner estimates that more than material should include details of edi- 25% of critical data within large busi- This is somewhat analogous to product tions and dates of that source materi- nesses is somehow inaccurate or quality. All product manufacturers have al and any ancillary information used incomplete. And that imprecise data quality assurance programs, but no for updates. The methods of deriva- is wreaking havoc.”4 product is ever 100% perfect, or tion should include details of the 1. Andrea Malcolm, “Poor Data Quality Costs 10% of Revenues, Survey Reveals,” ComputerWorld, July 1998. 2. Wayne W. Eckerson, “Data Quality and the Bottom Line,” TDWI Report Series, 2002. 3. Thomas C. Redman, “Data: An Unfolding Quality Disaster,” DM Review, August 2004. 4. Rick Whiting, “Hamstrung By Defective Data,” InformationWeek, May 2006. SAP Thought Leadership – Information Management 5
  • 5. operations used, the logical process- The first task is to measure and analyze additional data elements can be added es (for example, algorithms and (or profile) the data to determine the as required to the data to enhance its transformations) and details of any type and number of defects that may information content. subsequent processing used in exist. This step provides the knowledge producing the final digital data. required to develop a strategy to Data associated with a business pro- • Accuracy: This is the closeness of improve the quality of the data. Once cess may come from a variety of differ- the topic data associated with an this has been done, the next task is to ent sources, and although the data may object or feature, to the true values, cleanse (parse, standardize, and cor- be consistent and correct for any given or values that are accepted as being rect) the data. This requires parsing it source, merging it with other sources true. It is usually recorded as a per- into its individual components, standard- can lead to duplicate records, and centage correctness for each topic izing data formats and values based on redundant or inconsistent data ele- or attribute. internal and industry business definitions ments. The role of the match and con- • Currency: This is the level at which and rules, and applying cleansing algo- solidate tasks is to resolve and remove the data is current (as opposed to rithms to remove syntactical or semantic this duplicate, redundant, and inconsis- the date of the last update). data errors. At this stage in the process, tent data. Once the data has been • Logical consistency: This is the fidelity of data in relation to the Continuous Monitoring data structures. Measures ongoing data quality • Completeness: This is a measure scores and provides alerts of the correspondence between the Measure real world and the specified dataset. Quantifies the number and types of defects It includes a statement of the rules Consolidate for including or excluding items from Combines unique Analyze the dataset.”5 data elements from Assesses the matched records into a single source nature and Although these comments concerned cause of data defects geographic data, the attributes of Match lineage, accuracy, currency, logical consistency, and completeness apply Identifies duplicate records within Your equally to all forms of data, regardless multiple tables or databases Data of its type or source. To maintain data quality, organizations need to Match & Consolidate constantly monitor and measure these Parse Enhance Identifies and attributes as a part of an ongoing data Appends additional isolates data quality program. data increasing the elements in value of the information data structures Components of a Data Quality Data Enhancement Standardize Program Normalizes data value and Correct formats according to your Figure 1 (courtesy of SAP) illustrates Verifies, scrubs and rules and that of third-party references the main tasks of a data quality appends data based on a set of sophisticated program. algorithms that work with secondary source Data Cleansing Figure 1: Components of a Data Quality Program [Source: SAP] 5. R.S. Walker, Data quality standard, In: Proceedings of AGI’94 Conference, 1994. 6 SAP Thought Leadership – Information Management
  • 6. cleansed, enhanced, consolidated, and An example of the business case for required data quality, and a business merged, it is ready for use. The quality approaching data quality improvement case built to address those data of the data, however, must be moni- from a business process and business quality issues that provide the best tored and profiled on a continuing basis entity perspective can be found in a return on investment. Unfortunately, to measure its accuracy, currency, con- reply to an April 2006 Business Intelli- this understanding is severely lacking sistency, and completeness. gence Network blog (“Surprise! Poor in many companies. Data Quality Costs a Lot of Money”) A Business Process Perspective by David Loshin: Data Quality Problems by to Data Quality Business Entity “At Shell, the business thought it was IT staff usually think of data quality ini- selling 20,000 product combinations to A TDWI survey published in an April tiatives in terms of individual applica- 20,000 commercial customers, and 2006 research report on data quality tions. From a business perspective, was actually selling 5,000 products to (“Taking Data Quality to the Enterprise however, the problems caused by poor 6,000 customers! It can be seen that through Data Governance” by Philip data quality typically surface in the the potential for cost saving in a situa- Russom) showed that customer and business processes of the organiza- tion like that was significant.” product data are the two main types tion. Examples include: of business entity that are especially • Wasted marketing efforts and costs Data quality initiatives to correct susceptible to quality problems. Sur- • Customer dissatisfaction problems like those at Shell can only prisingly, as can be seen in Figure 2, • Shipment, delivery, and billing errors succeed, however, if an organization financial data came third in the table of • Excess inventory understands its business processes, results. Customer and product are two • Incorrect financial statements and the data and data quality required of the main components of the master • Delays in new product introduction by those processes. In this way, actual data of an organization. data quality can be measured against To improve data quality, IT staff must change their application-specific approach to data quality initiatives, and Type of Business Data Percentage of Respondents align them instead with key corporate Customer Data 74% business processes, such as customer Product Data 43% marketing and supply chain optimiza- Financial Data 36% tion, and underlying business entities Sales Contact Data 27% like customer and product. This align- Data from ERP Systems 25% ment helps improve data consistency across application systems, which in Employee Data 16% turn simplifies data integration tasks, International Data 12% such as building an integrated product Other 10% catalog or creating a single view of the customer. Figure 2: Business Entities Susceptible to Data Quality Problems SAP Thought Leadership – Information Management 7
  • 7. The Role of Master Data and simple data structures, has simple As companies begin to realize that they keys and governance rules, is often must focus data quality efforts on spe- Master data is defined as data about standardized (U.S. state codes, for cific business processes and business the key business entities of an organi- example), involves only a few applica- entities, the ability to manage master zation. Examples include customer, tions, and is reasonably stable. data becomes increasingly important. product, organizational structure, and This is why many data quality initiatives chart of accounts. A common question Master business entity data, such as often evolve into master data manage- about master data is, “What is the dif- customer and product, on the other ment projects. The Business Intelli- ference between master data and refer- hand, is usually ill-defined, has complex gence Network recently published a ence data?” Some people take the data structures and relationships, report (“The Pillars of Master Data position that they are the same thing, requires compound and intelligent keys Management: Data Profiling, Data Inte- but it can be argued that not all refer- and complex governance rules, is not gration and Data Quality”) by David ence data is master data. For example, usually standardized, involves many Loshin that discusses this topic in lookup and code tables that are used applications, and changes frequently. detail and is recommended reading. to encode information, such as state names and order codes, are not strictly Does this distinction really matter? Types of Data Involved in Manag- master data tables. When developing data quality manage- ing Data Quality ment and master data management The dividing line between master data systems, it does. Cleaning and manag- A data quality program involves many and reference data is not always clear- ing master reference data is a reason- different types of data. The data may cut. One solution is to break master ably easy job. Cleaning master data, on be current or historic, detailed or sum- data into two types: master reference the other hand, is a complex task and marized, and may have a structured, data and master business entity data. requires data cleansing tools that are unstructured, or semi-structured for- Master reference data has well-defined specifically designed for that purpose. mat. It may consist of master business entity or master reference data, or activity data created by business trans- action (order entry, billing, shipping), BI (reporting, analytics), planning (bud- geting, forecasting), and collaborative (e-mail, blogging) applications. All of this data must be structurally and semantically consistent with its associ- ated technical and business rules. One of the main objectives of a data quality program is to assess and correct data consistency and accuracy problems caused by applications that do not fully enforce those rules. Most data quality initiatives focus almost exclusively on structured data, which represents only a fraction of the data that exists in an organization. There is, however, increasing interest in 8 SAP Thought Leadership – Information Management
  • 8. leveraging the value of unstructured customers are providing data in lan- (CD, Jumbo CD, and so on), and wish data in both master data and business guages other than English. These to standardize the values for yield and intelligence (BI) applications. This is problems are causing considerable minimum deposit. especially the case for customer and business inefficiency and are leading product data, which may be managed to costly mistakes being made. Solving These types of data cleansing opera- in unstructured files, or encapsulated in these problems, therefore, can provide tions can be used for all types of a single field with a structured file or significant business benefit. unstructured information. To date, database. To be useful this unstruc- most companies have focused on tured data must be cleansed and trans- Product Data Example customer data, but many are now formed into a semi-structured (XML, Figure 3 illustrates how unstructured beginning to process product and for example) or structured format. data describing financial product offer- other types of data. ings can be decomposed into separate Cleansing and transforming unstruc- data elements, and then used to Customer data is usually easier to tured data improves operational effi- populate a product table. The original deconstruct and clean than product ciency and can add significant value to unstructured data could come from a data. Although there are variations in existing decision-making applications. Web page or a product catalog, but in people names, company names, titles, Examples of applications here include: some cases may simply be encoded phone numbers, and addresses, cus- • Customer and market intelligence – in a single field of a structured data- tomer data records are reasonably con- Internet Web pages base record. sistent in the number of data elements • Customer sentiment and complaint involved. This is mainly due to the fact analysis – Web logs (blogs) and In this example, the customer wants a that most countries have standardized customer support center call records data quality solution to parse, standard- name and address formats. This is not • Product safety and quality analysis – ize, and cleanse the financial data, and the case for product data, which is Service center records to identify the financial institutions that more variable, and involves a wider • Product master data management – are offering certificates of deposit range of industry standards. Some Product catalogs (CD). They also want information about of these standards are very specific, • Legal discovery – E-mails and maturity (3 months, 1 year, and so on) while others are global in nature. instant messages and the type of CD being offered Cleansing Customer and Product Data Every company struggles with the enor- mous amount of customer and product data that is represented in a variety of ways across disparate systems or departments. A product name such as LIGHT BULB-120 VOLT 100 WATT may be entered multiple times with dif- ferent capitalization, spelling, abbrevia- tions, and punctuation. Adding to this issue is the problem of supporting multiple languages in today’s global marketplace where suppliers and Figure 3: Cleansing Unstructured Product Data SAP Thought Leadership – Information Management 9
  • 9. Supporting Standards Organizations that deal with global information often need to adjust data Many organizations create their own fields for unique regional attributes and hierarchical product codes, while others relationships. In some countries there follow industry standards where possi- are titles and relationships that may not ble. Supply chain efficiency can be be known in the English language. gained by appending industry standard Sonya Gonzalez viuda de Martinez, for product codes to product descriptions. example, translates to Sonya Gonzalez A common master product identifier widow of Martinez. In this case, the shared by all suppliers makes it much text widow of might not be recognized easier to support roll-up and drill-down properly by the organization if it is just reports and analyses based on a family processing standard first and last name of products. elements. This could be important for a banking institution that is trying to dis- Well-known standards include: burse the funds of a will to the correct • United Nations Standard Products Sonya Gonzalez. and Services Code (UNSPSC) • Export Control Classification Number Another example is parsing vendor (ECCN−used by the import/export information for paying invoices. An and transportation industries input string in Portuguese could read • International Organization for Computadoras Brasileiras Incorpora- Standardization (ISO) das representante autorizado de Micro- • National Stock Number Codes soft. The text representante autorizado (NSN)−used by the military de is the equivalent to authorized agent for (in this case, Microsoft). What It is very important, therefore, that exists here is a business relationship The Issue of Increasing data cleansing tools provide support between Computadoras Brasileiras Complexity for these standards. Incorporadas and Microsoft, and this understanding is crucial if a payment is Corporate data is created and managed Supporting National Languages to be made payable to the right party. It by many different business processes is important when dealing with interna- (see Figure 4) and in most companies, Whether an organization has custom- tional data to have a data quality solu- the applications supporting those busi- ers, suppliers, or products in the United tion that is Unicode-enabled, and that ness processes have been developed States, Europe, Asia, or any combina- recognizes international characters oth- over many years. Some involve legacy tion of countries, it is important that the er than those found in the English lan- applications on centralized mainframes, level of data quality is the same across guage. By applying data quality global- while others employ distributed comput- all data stores, and it is vital to be able ly, organizations have a more complete ing technologies, such as client/server to trust the data, regardless of what view of customer and product informa- computing, Web-based computing, language the original data is in. Equally tion, which improves business deci- or Web services. Some are developed important is the ability to present data sions, customer service, and supply in-house, while others use packaged in local languages, and to be able to chain management. solutions from applications vendors tailor data quality program rules to suit a specific language or region. 10 SAP Thought Leadership – Information Management
  • 10. or software-as-a-service vendors. A Universal Approach to Data dynamically by applications (see Company acquisitions and mergers Quality Management Is Required Figure 4). This allows business rules for add to the application mix. These appli- data quality enforcement to be moved cations have usually been developed The trend by organizations toward a outside of applications and applied and deployed independently of each services-oriented architecture (SOA) universally at a business process level, other, without any notion that one day for building new applications and rather than at the individual application the data they create and manage may integrating existing ones reduces level. These services can be called be used by other applications for point-to-point application connections proactively by applications as data is other purposes. and eases application integration. It entered into an application system, also encourages developers to think or reactively by batch data quality man- It is this complex and unmanaged envi- in terms of supporting business pro- agement tools after the data has been ronment that has led to many of the cesses as a set of services, rather than created. Single-record rule enforcement data quality problems that exist today, as a package of monolithic applications. can be done proactively, whereas rule and given the huge growth in the data enforcement that spans records, data- being generated both inside and out- The SOA approach supports any task bases, and applications systems will side of companies, application com- that can be defined as a service. This typically be done reactively. plexity and data quality is likely to concept not only applies to business steadily get worse unless companies process tasks, but also to data and pay more attention to enforcing data system management tasks as well. quality and evolve to support a univer- This enables the tasks associated with sal approach to managing data cleans- a data quality program to be deployed ing and data quality management. as a set of services that can be called Figure 4: An SOA Can Enable a Universal Approach to Data Quality Management web external and content syndicated Business Business plans digital information collaboration planning forecasts media processes email processes budgets instant messages workgroup documents Service-oriented corporate architecture low-latency documents Business Business data transaction intelligence transaction processes processes historical data data Master Data quality integrated services master data data processes SAP Thought Leadership – Information Management 11
  • 11. sap Businessobjects inFormation management solUtions SAP is an industry-leading provider of products, relational databases, model- • An open metadata repository that business intelligence and information ing tools, and third-party solutions. IT allows the tool to be used by management solutions. The objective staff and business analysts use a thin- custom-built applications of information management is to pro- client Web interface to explore metada- • Scheduling of profiling tasks to run vide customers with a flexible solution ta and gain a better understanding of any time without operator for building an enterprise-wide data the metadata used in projects, to sup- intervention integration and data quality manage- port impact analysis, and to report on ment environment. metadata and data lineage. The knowledge gained from SAP BusinessObjects Data Insight concern- SAP BusinessObjects solutions sup- SAP BusinessObjects ing existing data quality issues can be port three key information management Data Insight™ used to develop a detailed strategy for functions in data integration, data cleansing and correcting data defects quality, and metadata management. SAP BusinessObjects Data Insight using SAP BusinessObjects Data software is a data profiling tool that Quality Management software. Data Integration handles the data assessment task (see Key data integration solutions include: Figure 1) of a data quality program. It is SAP BusinessObjects Data • SAP BusinessObjects Data Integra- used to monitor, measure, analyze, and Quality Management tor software, a batch-oriented report on the quality of data stored in extract, transformation, and load relational database systems and flat SAP BusinessObjects Data Quality (ETL) tool supporting enterprise data files. It helps users gain visibility and a Management handles the data cleans- consolidation and federation detailed understanding of the data qual- ing, enhancement, matching, and • SAP BusinessObjects Data ity defects in the source data. Key fea- consolidation tasks of the data quality Federator software, an enterprise tures of SAP BusinessObjects Data program shown in Figure 1. The information integration tool enabling Insight include: main facilities provided by SAP on-demand data access to a variety • Flexible processing models that can BusinessObjects Data Quality of data sources process source data in place or after Management include: • SAP BusinessObjects Rapid Marts® extracting it into SAP BusinessOb- • Parsing and standardizing customer packages, packaged data integration jects Data Insight and other types of operational data solutions for enterprise applications • Automatic discovery of business • Correcting data based on supplied such as Oracle, PeopleSoft, and rules and relationships and user-defined dictionaries and Siebel • Support for data redundancy profil- business rules ing, drill-down frequency distribu- • Appending additional information Data Quality tions, cross-column and cross-table such as phone numbers to provide SAP BusinessObjects Data Insight™ comparisons, and pattern recognition a more complete set of data software and BusinessObjects Data • Allows users to establish thresholds • Matching data to build data relation- Quality Management software enable and set up automated alerts to notify ships, and identify and remove dupli- the profiling, cleansing, enhancement, if analysis results exceed a specific cate records matching, consolidation, and monitoring quality metric • Consolidating data to create a single of data. • Graphical and dashboard reports view across databases (using Crystal Reports® software) Metadata Management that provide summary profiles, fre- SAP BusinessObjects Metadata quency distributions, and referential Management software collects and integrity information unifies metadata from BI objects, ETL 12 SAP Thought Leadership – Information Management
  • 12. SAP BusinessObjects Data Quality Data Quality Project Architect Management now includes universal data cleanse (UDC), an add-on option that allows user-defined transforms and their associated dictionaries and busi- Web Browser Universal Data ness rules to be created for cleansing Web Server Cleanse Data other types of data (unstructured prod- Insight uct data, for example). Key features of Project the data cleansing capability include: portal Data • Identifies, standardizes, and corrects Packaged Quality Applications Data quality global data for over 200 countries web services Server Data Quality • Provides international name (firm and ( NET & Java) Repository person, prefix, title) parsing packages (workflows, for numerous countries Custom transforms Applications and runtime • Identifies customer information metadata) (addresses, phone numbers, social Metadata Repository security numbers, dates) and verifies (runtime statistics) that it is properly formatted • Allows users to add custom dictionar- Figure 5: SAP BusinessObjects Data Quality Management ies and rules to parse and standard- ize other data elements, such as part At the heart of SAP BusinessObjects Support for Web services and an SOA numbers, product codes, purchase Data Quality Management is a data environment in SAP BusinessObjects orders, SKUs, customer identification quality server (see Figure 5) that Data Quality Management allows data numbers, and so forth executes workflows and workflow quality software services to be shared • Handles free-form text up to 8,000 transforms to carry out data quality by multiple business processes. Sepa- characters in size and parses it into management tasks that have been rating data quality rules from the pro- appropriate structured data fields defined using the provided GUI-driven cesses that use them improves flexibili- project architect. These workflows ty and makes it possible for the rules User-defined dictionaries can be can be called dynamically as a Web to be dynamically maintained to meet populated interactively using the service, run as a standalone batch job, changing business needs. project architect, or can be bulk loaded or integrated with external systems and from an XML file. The bulk load capabili- applications, such as SAP software, SAP BusinessObjects Universal ty is useful for loading industry standard Informatica, Oracle, PeopleSoft, Data Cleanse product codes and classifications, and Siebel. weights and measures tables, interna- The data cleansing process of tional language translation tables, and The capability of SAP BusinessObjects SAP BusinessObjects Data Quality data classifications identified using SAP Data Quality Management to call a data Management standardizes data to BusinessObjects Data Insight software. quality workflow as a Web service in an ensure a consistent record format, and SOA environment enables the work- corrects data based on classification flows to be used dynamically by an dictionaries and business rules. Prior organization’s operational business to the current release, the product processes and the quality of informa- provided cleansing transforms that tion to be checked, in-flight, as it flows focused primarily on customer data. between systems and applications. SAP Thought Leadership – Information Management 13
  • 13. Product Data One SAP BusinessObjects customer processes, regardless of what language uses this approach to reconcile data for the original data is in. Equally important An important use case for UDC is the 140,000 products that are managed in is the ability to present data in local cleansing, matching, and consolidation 11 different data repositories. These languages. UDC is Unicode-enabled, of structured and unstructured product techniques could also be extended to supports mixed language product data. A goal for a manufacturer, for support the reconciling of product data descriptions, and incorporates example, may be to improve the consis- for integration into a single master data language-specific lexicons for colors, tency between item descriptions on a management system. sizes, and weights. It is, therefore, Web site and those in product cata- ideally suited for cleansing data in logs. To achieve consistency, the com- Multinational Data companies whose operations span pany would parse the item descriptions many different countries. on the Web site and in the catalogs, Another scenario for UDC is for the understand the relationships between processing of multinational data. For the items, and use the results to cor- an increasing number of organizations, rect the item entries in each location to it is important to be able to create make them consistent. intelligent and standardized parsing 14 SAP Thought Leadership – Information Management
  • 14. sUmmarY Companies today cannot operate effec- It is important that data quality tasks For More Information tively, or compete successfully, unless are not buried in individual applications, they give their users timely access to but are implemented instead as a set of To learn more about SAP accurate and consistent information. services that can be used by any appli- BusinessObjects information manage- The three key words here are timely, cation from anywhere in the organiza- ment solutions from SAP, visit our accurate, and consistent. tion. This approach ensures that busi- Web site at www.sap.com/ ness processes obey a set of common sapbusinessobjects. The problem is that many organizations business rules, and permits data quality don’t know how consistent or accurate service levels to be adjusted without their data is, or how much poor data directly affecting existing applications. quality data is costing them. This is It enables a data quality program to be because these organizations don’t con- deployed in a phased approach, one sider data quality requirements from business process at a time, and allows the perspective of individual business applications to be migrated to a com- users and processes. Data quality is mon data quality approach in an not an all-or-nothing concept. Different orderly fashion. users need different levels of data qual- ity, and it is essential to categorize data Executives have a lot at stake when it by business needs and required data comes to data quality and the tough quality service levels. It is important to corporate compliance environment that profile and constantly monitor data to we live in today. It is crucial, therefore, ensure that it is satisfying those ser- that data quality, like data integration, vice levels, and to rapidly correct any be viewed from an enterprise perspec- deficiencies that are degrading service tive. SAP is a company that recognizes levels. Organizations also want to this need and, through its information ensure that data quality processes are management solutions, is addressing scalable and sharable, and provide the that need. flexibility to allow users to dynamically change business rules based on busi- ness needs and context. SAP Thought Leadership – Information Management 15
  • 15. 50 094 570 (09/03) ©2009 by SAP AG. All rights reserved. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP Business ByDesign, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S.A. in the United States and in other countries. Business Objects is an SAP company. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. www.sap.com /contactsap