Data Quality MDM


Published on

Data quality: a key factor for successful master data management

Published in: Business
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Quality MDM

  1. 1. Page 1 WHITE PAPER: MDM WHITE PAPER / Data quality: A key factor for successful master data management Master Data Management systems, also referred to as MDM sys- tems, enable a «single version of truth» to be represented for a wide range of company data. This means that all the relevant data systems of a company and their contents are consolidated, so that searching for information in several different, isolated systems becomes unnecessary. Irrespec- tive of which MDM system architecture has been chosen, both the high quality of the content data and the meta data are key factors for the success of the project. This applies to the implementation phase and the hoped-for increased efficiency of use of the com- pany data. The following White Paper describes the importance of high data quality in the MDM environment. In this respect, all the phases of a system which is being planned or is to be implemented are consid- ered. Furthermore, appropriate suggestions for (a) optimizing the data quality and (b) keeping it at a high level during day-to-day operation are also provided for existing MDM systems. All company and product names and logos used in this document are trade names and/or registered trademarks of the respective companies.
  2. 2. Page 2© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM Contents Master Data Management: what exactly does it mean? Data quality and master data management The quality of the meta data The quality of the content data The Data Quality Life Cycle Master data management systems: efficiency in use Uniserv Data Quality Products: effective options for integration in your MDM system It’s time to get on board: The Data Quality Audit List of references PAGE 3 PAGE 4 PAGE 5 PAGE 7 PAGE 8 PAGE 10 PAGE 11 PAGE 13 PAGE 13
  3. 3. Page 3© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM Master Data Management: what exactly does it mean? One of the great challenges in the corporate environment nowadays is effective master data management (MDM). The precise meaning of this term can be explained best of all by the following definition of David Loshin (2009): The aim is to obtain the so-called «single version of truth», i.e. that only one system has to be con- sulted to display all the relevant data of the entire company for the respective aspect. Searching for information across different systems is not neces- sary. Working with the data is more efficient. From this consideration, it can be recognized that MDM is not just a software solution but more importantly a question of the internal organization and definition of company data. In a final step, This data is synchronized in various ways and made available to the end user. Depending on the complexity of the existing systems and data of a company, the successful implementation of a MDM system can take several months or even years. THE IMPLEMENTATION IS TYPICALLY DIVIDED INTO DIFFER- ENT PHASES: –– first of all, the actual master data must be identi- fied and a list drawn up of the systems in which it is stored. –– The next step is to agree on a Master Data Object Model and ensure that the meta data is consistent. Once these fundamental aspects have been clari- fied, the various requirements for the data are decided on. Firstly, the requirements of the data users in the operative area must be considered. However, these specific demands also have to be considered in the implementation if MDM is to be basis of a Data Warehouse or Business Intelligence analyses. Further important points are issues such as data protection or compliance with anti-terrorism regula- tions. In short, all the legal, operative and analyti- cal requirements for the data must be considered. Moreover, the data should be available in an appropriate quality, so that the content require- ments for the data can be satisfied. Setting up a master data management system always means implementing a company-wide project. Projects of this size demand a great deal of brain- work from all concerned, especially if they involve the creation of the Master Data Object Model or the standardization of the meta data. The current trend is to call in external consultants who can liaise between the various stakeholders and departments and the personnel responsible for the various data sources with a degree of detachment in these phases. « [MDM IS…] ENVISIONING HOW TO ORGANIZE AN ENTERPRISE VIEW OF THE ORGANIZATION’S KEY BUSINESS INFORMATION OBJECTS AND TO GOVERN THEIR QUALITY, USE, AND SYNCHRONIZATION TO OPTIMIZE THE USE OF INFORMATION TO ACHIEVE THE ORGANIZATION’S OPERATIONAL AND STRATEGIC BUSINESS OBJECTS ».
  4. 4. Page 4© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM Data quality and master data management One issue which runs through all phases of master data management is data quality. This concerns very basic aspects of data governance, i.e. specific solutions for the master meta data, standardized data formats, standardized refer- ence keys, databases which are duplicate-free as far as possible, postally correct addresses of customers and suppliers and much more. It also concerns the appropriate data quality measures for each phase of the MDM project and the subsequent productive system. No matter from which aspect a MDM system is considered, a high quality of the data is critical for its success and therefore also for its efficient use in the company. What does high quality of the data mean here precisely? Firstly, it is certainly important that the data contents are correct, in order to pass on the correct information to the end user in the opera- tive or analytical business. However, some basic qualities such as unambiguousness, completeness, up-to-dateness and editability (in addition to others) also play a big part. On the one hand, many of the attributes stated here can be obtained by means of an appropriate system architecture which enables the relevant busi- ness rules to be mapped. On the other hand, the characteristics of the data itself, such as accuracy or objectivity, guarantee that it has a high quality. Two different types of data can be distin- guished in MDM: –– The meta data. This is data which is used to define the attributes of the content data or the actual user data. This also includes correct linking of the fields of different data sources, description of the reference keys, etc. –– The content data. The quality of the con- tents and the syntactic quality are concerned here. The correct address of a contact is an example of this, as are a standardized date format or the duplicate-free database of a variety of master data. Both of the data types stated here are equally cru- cial and essential for the data quality. The high (or low) level of one data type directly affects the level of the other. Regardless of whether an MDM system has been implemented or an MDM project is planned for the future, there are various possibilities for influencing the quality of the data at the different levels. The quality of the meta data and the quality of the con- tent data will be considered separately to provide a clearer overview. NO MATTER FROM WHICH ASPECT A MDM SYSTEM IS CONSIDERED, A HIGH QUALITY OF THE DATA IS CRITICAL FOR ITS SUCCESS AND THEREFORE ALSO FOR ITS EFFICIENT USE IN THE COMPANY.
  5. 5. Page 5© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM The quality of the meta data As mentioned above, meta data is data which is used to describe the attributes of the content data (user data). This includes the field names as well as value ranges or specified data formats. In a wider sense, this category also includes information which makes statements about data models, e.g. the linking of tables in a database. THE FOLLOWING EXAMPLES PROVIDE AN OVERVIEW OF POSSIBLE PROBLEMS IN THE AREA OF META DATA. –– If a large number of different data sources are involved, it is often noticeable that fields have the same names but contain different data or information. In the opposite case, different data sources have the same data contents and information, but the field name varies. Figure 1 illustrates this. Field Name Data source 1 Data source 2 First name Name Last name Name Street Address field 1 Postcode Address field 2 Place Address field 3 Telephone 1 Phone, business Telephone 2 Mobile Fig. 1: The different field names of the two data sources refer to the same data. The meta data of data source 2 is ambiguous and allows a great deal of room for interpretation. The same applies to the telephone data in data source 1. –– Agreement on data formats is just as important. Dates can be displayed in a variety of formats: System 1: 2010-06-11 System 2: 10-06-11 System 3: 11th June 2010 System 4: 11.6.2010 There are two possibilities for interpretation in the case of the format in System 2: 10th June 2011 or 11th June 2010. –– The complexity of the meta data and the meaning of field contents is illustrated using the example of the field “Customer status”. Meaning in system 1: Customers are those who have received adverti- sing material. Meaning in system 2: Customers are those who have registered on the company website. Meaning in system 3: Customers are those who have paid an invoice. Roland Pfeiffer Rastatter Str.13 75179 Pforzheim 07231 936-1000 07231 936-1000
  6. 6. Page 6© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM Data profiling is recommended, in order to obtain a comprehensive overview of the data, especially of the meta data of the source systems. In this respect, the meta data is identified in a reverse engineering process on the basis of content data. Profiling software provides an overview of the meta data in a very short time and uses a «drill-down» function to provide an overview of the data itself. The number of fields, their name, content type, value range, primary key potential and much more besides can be quickly viewed and ana- lyzed. Ideally, further dependencies between the source data tables can also be identified by means of join and dependency analyses Finally, on-going problems of the data or the filling of the data fields in the source systems can be inves- tigated in the profiling. The problems identified here often cannot be remedied with automatic cleansing at the press of a button but require cus- tomization of the processes instead. After the structure of all data sources which are to fill the master data management system has been ascertained, a Master Meta Data Model must be chosen. This model describes the data in the MDM system. An agreement on the meaning of the field names and the field contents has already been reached across the company. The creation of a Master Meta Data Model is important for the data quality, because it specifies which fields of the data sources fill which fields in the MDM system and which information the fields contain. Representation of the standardized meta data in the MDM and the mapping from the source systems. DATA SOURCE 1 CRM FIRST NAME LAST NAME STREET POST- CODE PLACE MDM SYSTEM FIRST NAME LAST NAME ... PRODUCT NAME ... DATA SOURCE 3 ORDERING NAME 1 NAME 2 PRODUCT AMOUNT ... DATA SOURCE 2 ERP PRODUCT NO. PRODUCT NAME PRICE ... ...
  7. 7. Page 7© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM The quality of the content data is just as important as the quality and unambiguousness of the meta data in a Master Meta Data Model. Even if the format of the content data matches and the manda- tory fields have been filled, this says nothing about the quality of the data. The following example explains this aspect: The example shown above may appear to be The example shows that even if the meta data is unambiguous, the quality of the content data can leave much to be desired. Contrary to expectations, the ID as the primary key is ambiguous. Although the street line is always filled, the streets are either spelled incorrectly or do not exist. The same applies to the postcodes and the place names. Even if the last field of the table should contain an unambiguous fax number, an e-mail address is found here. The e-mail address was obviously not considered to be part of the master data. No field was provided for this information or syntactic validation did not took place when the data was created. The example shown above may appear to be trivial, but possibilities are already available for automatic cleansing, particularly in the area of addresses, both in batch processing and in online mode. This automated address cleans- ing can be used e.g. in the validation of the correct address. There is also the possibility of arrang- ing duplicate or similar data records in groups, attaching information to the other data record or eliminating clear duplicates, if required, or at least to mark them as such. Master Meta Data Model Data record 1 Data record 2 ID (Primary key) 0815 0815 0815 First name M. Roland Max Last name Diener Pfeiffer Mustermann Street&Housenumber Rastaterstrasse 14 Rastatter Str. 13 Sommer-Strasse Postcode 75197 75197 12345 Forzheim Pforzheim Neustadt Telephone, business 07231 936 –0 07231 936 0 - Fax, business - - Data record 1 Place name The quality of the content data
  8. 8. Page 8© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM One of the challenges is the continuous activity of the MDM system. The data is never available in isolation or at standstill. Instead all the data is linked together and continually on the move. With regard to the data quality, periodic cleansing of the customer data is only possible to a very limited extent. Each update, no matter whether it is a measure for optimization of the data quality across all the data or concerns the input of additional data records (e.g. after the purchase of further customer data or after the merger of two companies), is inevitably carried out with the system in operation. It is therefore all the more important that all the data records entered in the master data management system meet a certain quality standard before they are entered. THE UNISERV DATA QUALITY LIFE CYCLE SHOWS WHAT THIS COULD LOOK LIKE: 1ST STEP : The relevant master data in the various source systems must be identified first of all. These data sources can then be analyzed by means of profiling tools, which Uniserv also offers. Problems at the meta data level and in the data itself can thereby be identified and measures to remedy them initiated. 2ND STEP : In a further step, the (initial) cleansing takes place on the basis of the knowledge gained in the profiling. All the content data is first of all validated for correctness here. The Data Quality Life Cycle In the case of data from the address environment, this means that the addresses are validated and corrected and supplemented as required. Another component of the initial cleansing could be a duplicate search or the arrangement of identical or at least similar data records from the same or additional data sources in groups. Consolidation of the data records or enhancement with (external) additional information are also possible. Various parameters can be used to group or link data records from different data sources. Identical or similar names and / or addresses would be conceivable. Merging data records from different data sources (e.g. customer data with information from the ERP system) can also be carried out by means of unambiguous keys (identifiers). 3RD STEP : The quality of the data records is validated at the front- end of the data sources during data entry in a real-time check (also referred to as a Data Quality Firewall). The correctness and completeness of the address data is verified first of all. In a further step, it is also checked whether the system already contains a data record with the same contents. The high performance of these validations is very important in the real-time check. If they take too much time, they are by-passed. There is a danger that the master data management system is «contaminated» by un-validated data. 4TH STEP : Permanent validation of the data quality in the MDM system is appropriate even with greatest possible efforts. Business rules specified in the business processes can be kept under surveillance through skilful monitoring. If previously specified threshold values of the so-called Key Performance Indicators (KPIs) are exceeded, measures can be implemented in good time, in order to guarantee a permanently high quality of the master data.
  9. 9. Page 9© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM 5TH STEP : Especially when larger volumes of data are repeatedly added to the existing master data, periodic cleansing measures with the following objectives are recommended: –– To detect errors which have entered the system in spite of the Data Quality Firewall - either because not all the rules have been mapped there or the warnings for these rules have been circumvented by user interventions (e.g. a suspected duplicate with a low probability). –– The data is subject to an ageing process, particularly in the case of customer MDM systems. Places of destination are merged in local government reorganizations, postcodes changed, streets are renamed or individuals move house. (This often takes place without the person or the company informing you.) As a result, the error rate for incorrect records in your MDM system typically increases by 1-2 percent per month. This can be prevented by means of this step. It should be closely examined which master data is affected. It is also important that existing links between the data records of different data sources or tables are not jeopardized in the periodic cleansing measures. In this sequence, the individual steps of the Data Quality Life Cycle create a sound basis as a whole for bringing the quality of the master data of a company to a high level and keeping it there in the long term. Nevertheless, master data management systems are living systems. The concept of «Cycle» (i.e. recurring) should therefore always be considered, especially against the background of changes and amendments to business rules and the increase or modification of the demands made on the data. CLOSED DATA QUALITY CYCLE Profiling Cleansing Real-Time CheckMaintaining Monitoring Initial clean-up Implementation of Data Profiling and investigation of the data Analysis of the data quality and clean- sing of customer, transaction, order, financial, statistical data ... Securing the data quality directly at input Integration of external data. Provision of data for external systems. Application of change reports from third-party companies. (anti- ageing) 5. 1. 2. 4. 3. The process for data quality in master data management consists of the initial cleanup with the steps profiling (1) and cleansing (2 ) followed by a real-time check at data entry (3) and monitoring (4) and maintaining (5) in a closed loop. Continuous monitoring of the data quality and compliance with the business rules for transaction, order, financial, statistical data ...
  10. 10. Page 10© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM Two points in particular must be considered in order to ensure high efficiency when handling the master data: –– First of all, error-tolerant searching for data records should be mentioned. A search for the same or supposedly the same data records is activated when a new data record is created. The search can be actively started at the press of a button. However, the search also runs in the background without it being actively start- ed. If the system finds identical data records or potential duplicates, the similar data records are displayed to the end user in the form of a select list, so that he can decide whether the current customer is actually a new customer or an existing customer. –– Another important factor is the speed of this error-tolerant search. The end user is dependent on quick response times of the system. If he has to wait too long for the results of a duplicate check, it is by-passed and the quality of the master data will deteriorate. However, error-tolerant searching for data records is not only an important criterion in initial data crea- tion. The search must also take place quickly and provide reliable results when e.g. data records are searched for in a service centre. One characteristic of an MDM system is that the data stream flows in two directions: data is transferred to the system (e.g. initial creation of a customer record) and data is retrieved from the system by the user. Needless to say, this data also has to be correctly linked to all the other data in the respective context. It must therefore be guaranteed that when the data of the customer Roland Pfeiffer is retrieved, the system not only supplies his contact data but also e.g. the products which he has purchased at a certain price at a given point in time, any com- plaints he has made and which advertising mate- rial he has received (Single View of Customer). Master data management systems: efficiency in use ONE CHARACTERISTIC OF AN MDM SYSTEM IS THAT THE DATA STREAM FLOWS IN TWO DIRECTIONS: DATA IS TRANSFERRED TO THE SYSTEM (E.G. INITIAL CREATION OF A CUSTOMER RECORD) AND DATA IS RETRIEVED FROM THE SYSTEM BY THE USER.
  11. 11. Page 11© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM The Data Quality Monitor supports the moni- toring of KPIs. As a result, the optimizations of the source systems can be checked by means of business rules. It can also be used in existing master data management systems. Monitoring jobs can be configured very easily and automatically started. Furthermore, auto- matic notification of the status of the monitor- ing job is also possible. Where the initial consolidation of different systems and an initial cleansing of the data are concerned, the Uniserv Data Quality Batch Suite provides an excellent option for prepar- ing data of several source systems for input into the MDM system. Where high data quality and the associated efficiency of the MDM system are concerned, it is irrelevant which architecture is chosen. The quality of the data should always be at a high level, no matter whether the basic approach of a listing variant or a completely consolidated master data set or intermediate stages of both variants were selected. In this respect, the most interesting question is how quality assurance measures can be most effectively integrated in the data stream. The use of the Data Quality Explorer in the planning phase of the master data manage- ment is a good idea. The profiling of the differ- ent data sources described above can be car- ried out quickly and simply by means of the DQ Explorer. Well-developed workflow functions in a client/server architecture enables long-term optimization of the source data systems. Any irregularities can be attached to the respective data records in the form of notes. Jobs concern- ing problematic data can also be assigned to personnel by e-mail. Continuous optimization and the preparation for the initial transfer of the data to the MDM system can thereby be pursued over the term of the project. Uniserv Data Quality Products: effective options for integration in your MDM system IN THIS RESPECT, THE MOST INTERESTING QUESTION IS HOW QUALITY ASSURANCE MEASURES CAN BE MOST EFFECTIVELY INTEGRATED IN THE DATA STREAM.
  12. 12. Page 12© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM Not only are addresses validated for postal correctness and, if necessary, corrected in the DQ Batch Suite, the data of the various source systems can also be arranged in groups and sorted in an individual process tailored to the respective business rules. After the data has been grouped, it can be consolidated. It is thereby guaranteed that no important data is lost and the data sources are appropriately linked. Needless to say, primary keys can also be generated for the subsequent matching on retrieval of the data. Master data management systems are also characterised by the fact that data records are repeatedly changed through initial data creation, supplementary information or cor- rections. The quick return of a called data record also must be guaranteed by the sys- tem architecture. A concept for executing these changes and data calls is the integration of a Service- Oriented Architecture (SOA). Business rules can be implemented using applications by means of SOA. With respect to data qual- ity and the use of the Uniserv Real-Time Services, this means that duplicate data records can be prevented. The automated correction of address data is guaranteed. WITH RESPECT TO DATA QUALITY AND THE USE OF THE UNISERV REAL-TIME SERVICES, THIS MEANS THAT DUPLICATE DATA RECORDS CAN BE PREVENTED. THE AUTOMATED CORRECTION OF ADDRESS DATA IS GUARANTEED.
  13. 13. Page 13© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM List of references –– Loshin, D. 2009. Master Data Management. Morgan Kaufmann OMG Press. 274pp. ISBN 978-0-12-374225-4 The audit is the instrumental first step for mak- ing reliable decisions - and marks your per- sonal introduction to the project «Data Quality in your MDM system». During the audit, the quality of the addresses is primarily evaluated with the support of the data quality tools from Uniserv. In a second step, there is the possibility of getting to the root of the possi- ble causes of the deficient data quality in a proc- ess analysis. So the best thing to do is to contact us right away! It’s time to get on board: The Data Quality Audit The Uniserv DQ Audit can be used to make statements about the status quo of the in-house data. For further information about MDM please visit our web page or contact us directly: We are looking forward for advising and sup- porting you through your project.
  14. 14. Page 14© Uniserv GmbH / +49 7231 936-1000 / All rights reserved. WHITE PAPER: MDM Uniserv Uniserv is the largest specialised supplier of data quality solutions in Europe with an internationally usable software portfolio and services for the quality as- surance of data in business intelligence, CRM applications, data warehousing, eBusiness and direct and database marketing. With several thousand installations worldwide, Uniserv supports hundreds of customers in their endeavours to map the Single View of Customer in their customer data- base. Uniserv employs more than 110 people at its head- quarters in Pforzheim and its subsidiary in Paris, France, and serves a large number of prestigious customers in all sectors of industry and commerce, such as ADAC, Al- lianz, BMW, Commerzbank, DBV Winterthur, Deutsche Bank, Deutsche Börse Group, France Telecom, Green- peace, GEZ, Heineken, Johnson & Johnson, Nestlé, Payback, PSA Peugeot Citroën as well as Time Life and Union Investment. Further information is available at Experience: OVER 40 YEARS Market position: LARGEST EUROPEAN SUPPLIER Employees: MORE THAN 110 PEOPLE DIRECT MARKETING BI/BDW CPM CRM ERP E-COMMERCE DATA MIGRATION PROJECTS SOA ON-PREMISE/ ON-DEMAND MDM/CDI COMPLIANCE Contact: +49 7231 936-0 UNISERV GmbH Rastatter Straße 13 • 75179 Pforzheim • Germany • T +49 7231 936-0 • F +49 7231 936-3002 • E • © Copyright Uniserv • Pforzheim/Germany • All rights reserved.