Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Service Oriented Architecture


Published on

  • Be the first to comment

  • Be the first to like this

Service Oriented Architecture

  1. 1. Service-Oriented Architecture white paper To access the full document, please complete all the fields below and click 'Read Document'. By completing this form once you will have access to all similar documents without needing to register again. First Name: Last Name: Email Address: Job Title: Abstract: This paper describes a Service Oriented Business Phone: Architecture for the federated management of master customer data. The architecture creates a set of shared data Company: services that allow for consistent accessing and updating of Address 1: data across multiple systems, providing complete, accurate, and current customer data to any process or application at Address 2: any time. This approach involves creating cross-reference City: keys linking multiple systems and creating services against a logical representation of the data in multiple systems. State/Province: -- Select One -- These data services on a logical model combined with Zip/Postal Code: "clean" cross-reference keys creates a solution that is 10X cheaper to build. Country: UNITED STATES # of Employees: -- Select # of employees -- Department: -- Select your department -- Industry: -- Select your industry -- Where is your top area of need for data integration? Service Oriented Architecture -- Select one -- What best describes your job level and function? -- Select one -- Are you working with any of the following as data integration partners? -- Select one -- Would you like to receive more information on upcoming events regarding information integration? Yes No Read Document Cancel The information you are entering on this page and other information about your use of the attached document (described in the User Agreement and the Privacy Policy) will be stored in a file on your computer and transmitted to Bitpipe over the Internet. Bitpipe may provide this information to the owners of the document. Bitpipe and the document owner may use the data to track your use of the document, to contact you and to provide you with additional information about products and services that you might find of interest. In consideration of your access to the attached document you agree to such storage and uses as more fully described in the Bitpipe X-Stream User Agreement.
  2. 2. white paper TABLE OF CONTENTS To access this document, please return to page 1 to complete the Architecture Enabling Authoritative Source with Service-Oriented 3 form. What is MDM? 3 Good Data is a Key to Attracting and Retaining Customers 4 By completing this form once, you will have access to all similar Companies Lack Good Customer Data Today documents without needing to register again. 6 A Reference Architecture for Master Customer Data Management 7 A Proven Implementation Process 8 Profile Source Systems 8 Cleanse and Standardize Data 9 Match and Load the Master Cross-Reference 10 Load Analytical Data into the Data Warehouse 11 Define a Data Model for the Services 12 Create your Data Service Requests 12 Create Update Services 13 Joint Solution by IBM and BEA Systems 14 Companies Must Manage Master Customer Data as an Asset 15 Conclusion 16
  3. 3. white paper 3 To access this document, please return to page 1 to complete the form. By completing this form once, you will have access to all similar documents without needing to register again. ENABLING AUTHORITATIVE SOURCE WITH SERVICE-ORIENTED ARCHITECTURE A JOINT WHITE PAPER BY BEA SYSTEMS AND IBM Customers are the lifeblood of your business. In order to attract and retain them, your customer-facing processes must be as efficient and effective as possible. Complete visibility to all current information on the customer will have an enormous impact on these processes. In addition, every customer interaction should refine the bigger picture understanding of the customer. But, this information is often locked away in multiple systems, making it difficult to obtain the complete, accurate, and current view of the customer required. Traditional solutions to this problem often involve replicating all data in one or multiple places. But replication is expensive and creates latency related problems, as well as errors and inconsistency in the data. This paper describes a Service Oriented Architecture for the federated management of master customer data. The architecture creates a set of shared data services that allow for consistent accessing and updating of data across multiple systems, providing complete, accurate, and current customer data to any process or application at any time. This approach involves creating cross-reference keys linking multiple systems and creating services against a logical representation of the data in multiple systems. These data services on a logical model combined with “clean” cross-reference keys creates a solution that is 10X cheaper to build.
  4. 4. white paper 4 WHAT IS MDM? To access this document, please return to page 1 to complete the form. Every business has elements of core business reference data which are used in multiple types of applications and business processes. Often this is the most important data that the business has, since it represents the business’ understanding of its customers, By completing this form once, you will have access to all similar suppliers, products, inventory, bills of materials, or parts. This type of data is called Master documents without needing to register again. Data, and it is one of the most important assets that a company owns, although it is often not treated that way. Because of its importance to the business, this data is often stored in multiple systems for different purposes. For example, customer data is captured and stored during the sales cycle in many business applications, and later it is also stored in support systems to provide ongoing customer support after the sale. The problem with this is that keeping it synchronized and aggregated is very difficult. In our example, it is likely that after the sale is made, the support system will have better information on the customer than the sales system. If Sales now wants to sell a new product to that customer, it has no way to benefit from the new information in the support system since this data is kept in separate stove- piped systems, with little or no visibility across them. Master data management (MDM) focuses on creating a single logical representation of this data that can be shared across multiple applications and processes. Rather than creating separate copies of master data with no way to tie them together, MDM provides linkages between separate instances of this data, allowing businesses to maintain a consistent and complete view of it at all times. In effect, MDM allows enterprises to leverage this data as a consistent and important corporate asset. GOOD DATA IS A KEY TO ATTRACTING AND RETAINING CUSTOMERS For many businesses, customer data is the most important information in the business. This information is vital to understanding customer buying patterns, identifying up-sell opportunities, providing a higher level of customer service, tailoring and optimizing marketing activities, predicting and addressing business triggers like renewals, recalls, and upgrades. Today many processes and applications have a direct impact on customers, and most of these are starved for information that could and should have an impact on their operations and decisions. For example, knowing that a customer has
  5. 5. white paper 5 recently made a large purchase may impact the level of customer service. Unfortunately, To access this document, please return to page 1 to complete the this data is simply not available in an aggregated form in most businesses. form. “ By 2007/08, 30% of the Global 2000 will have created a comprehensive framework for the management of referenced at a.” – META Group By completing this form once, you will have access to all similar documents without needing to register again. accurate view of customer data, sometimes referred to as a single or A complete and “360° view,” provides enormous business benefits. These benefits include a reduction in sales and marketing costs, improved customer satisfaction, higher service renewal rates, and optimized allocation of resources. The basic requirements to provide the optimal value from this data are the following: • Complete – It must represent all known information about customers across all systems • Accurate – It must be cleansed, de-duplicated, and verified against established business rules • Current – It must be up-to-date, reflecting the latest relevant business events • Accessible – It must be easily available to enterprise processes whenever they need it It is easy to point out everyday examples of the customer service frustration that can be caused by a lack of integrated customer data. For example, anyone who has had to repeat their personal information over and over again on the telephone to various support people in order to get a simple question answered knows this frustration. However, the true business impact goes well beyond this. According to the META Group, “Customer data integration can provide real ROI benefits by improving the underlying quality and real-time accessibility of synchronized customer data.” CASE STUDY: A Global 1000 manufacturing company needed a better understanding of their customers. While their existing EAI infrastructure and CRM systems were excellent at processing customer-based transactions, they were unable to consolidate a complete customer record from their multiple source systems. At the same time, their databases contained many duplicate entries for the same customer with no way of linking records together across systems, or even within the same system. As a result, they were spending too much money on their marketing programs without
  6. 6. white paper 6 seeing the level of results commensurate with the level of investment. To access this document, please return to page 1 to complete the form. By implementing an MDM solution, they were able to reduce the average cost per qualified marketing lead by 80%. In addition, they now have a single source for complete By completing this form once, you will have access todatasimilarall systems which has a positive impact on many and accurate customer all across documents without needing to register again. as well. other processes, COMPANIES LACK GOOD CUSTOMER DATA TODAY The problem of distributed and duplicated customer data is not a new one. Companies have used a variety of mechanisms to try to deal with this issue, however none of them have completely solved the problem of providing complete, accurate, and current data that is easily plugged into customer centric processes. In fact, they are all likely participants in a customer-oriented master data management solution. CIF — Traditionally, many companies have used Customer Information Files (CIFs) to centralize customer data. CIFs are usually created by extracting data from source systems in batch, and loading customer data into a common location. CIFs fail to meet the requirements for current accurate, and complete data, since they are loaded infrequently and do little to correct data or link records together across systems. CRM — Probably the most common misconception is that Customer Relationship Management (CRM) systems solve this problem. These systems allow management of customer-centric processes like sales, marketing, and customer service, but data accuracy is often a problem, and they are unable to provide a complete, cross-system view of data. Data Warehouse — Data warehousing has also been commonly used as a mechanism to consolidate customer data, as they are excellent sources of complete and accurate data. However, they fail to meet the requirement of current data since they are incapable of providing up-to-date data and immediate recognition of events. In addition, warehouses are usually designed around dimensional schemas that are optimized for multi-conditional queries, but not well-suited to rapid cross-referencing of core data elements. ODS — Operational data stores (ODS) are another mechanism commonly thought to provide customer data consolidation. However, ODS implementations fail to meet the completeness requirement, since they simply aggregate raw transactions, and do not
  7. 7. white paper 7 provide context around how individual transactions or their underlying data elements are To access this document, please return to page 1 to complete the related together, making it impossible to obtain a complete view of any given customers. form. More recently, as companies have begun to focus specifically on MDM, other alternatives By completing this form once, you will have access to the similar application vendors have created master data have emerged. Many of all enterprise documents without needing to register again. augment their base offering. These products can be effective, but they components that do not automatically accommodate data completeness or accuracy, so additional software is required. In addition, these products can be difficult to plug into enterprise processes, focusing instead on providing an application to access the data. However, achieving the synchronized view in this way is virtually impossible. All of these and other similar solutions attempt to create a monolithic database for customer data. This approach has fundamental limitations as replication involves bulk movement of data and creates latency, synchronization and inconsistency issues. A REFERENCE ARCHITECTURE FOR MASTER CUSTOMER DATA MANAGEMENT The following diagram describes the service-oriented architecture for federated master data management. In this architecture: 1. A matching service (such as IBM® WebSphere® QualityStage™) is used to create cross-reference keys across multiple systems. While creating the cross-reference keys, best practices include data profiling, data cleansing, de-duplication, and standardizing data formats. 2. An ETL tool (such as IBM® WebSphere® DataStage®) is used to perform the initial load of the cross reference database, and to create “historical” aggregates, which can be stored in a separate data warehouse. 3. An EII tool (such as BEA Liquid Data) is used to create composite data services that span all sources, cross reference database as well as a data warehouse. Note that the cross- reference keys created in step 1 are essential to link data in different systems. All data access and update logic as well as policies like security and caching are defined in this layer - which allows for consistent definition of these policies across all systems. 4. Data cleansing and matching functions are also exposed as services by the EII tool. This means applications can link “Phil” on the phone to “Phillip” in the customer database. Thus, all of this data is available to calling applications (like dashboards, workflows, and portals) as a service from one logical source. This dramatically simplifies application development as well as solves the traditional of data latency and inconsistency.
  8. 8. white paper 8 To access this document, please return to page 1 to complete the form. By completing this form once, you will have access to all similar documents without needing to register again. A PROVEN IMPLEMENTATION PROCESS Implementing this service oriented architecture is not complex and can be up to 10X faster than implementing data replication and related synchronization processes. Following is a suggested approach that we have seen work for mutual customers. PROFILE SOURCE SYSTEMS The first step to an MDM strategy is to understand the data in source systems. This requires the ability to profile the data in source systems to understand its content, structure, and relationships. Automated profiling tools accelerate this process, allowing analysis of column values and structures, and uncovering data anomalies, primary and foreign keys, relationships, and table normalization opportunities. Profiling also helps to uncover business rules within the data that will later be used to provide ongoing validation of data quality. Profiling is a critical activity for accelerating MDM efforts. When choosing a profiling tool,
  9. 9. white paper 9 it is important to select tools that can access and understand the various data sources To access this document, please return to page 1 to complete the you are trying to reach. In addition, the profiling tool needs to automate the process of form. understanding data and assist in building a common metadata understanding across systems. Other considerations for profiling tools include the ability to deal with large By completing this form once, you will have access to all similar documents without needing to register again. Most tools will suggest profiling on a sample set of data, but often these volumes of data. approaches will miss important data trends that do not show up in the sample. CLEANSE AND STANDARDIZE DATA The second step is to clean and standardize the data records. Cleansing and standardizing involves assigning categories to individual elements within the data and applying rules to these data category based on their business content. For example, the text “100 St. Virginia St.” would be parsed into four fields of data. A lexical analysis of those fields would determine that the “100” is the street number, the “Virginia” is the street name, and the second “St.” is the street type. The first “St.” could be mistaken for a repeated street type, but a contextual analysis would show that it is actually part of the street name: “St.Virginia”. Once the data is categorized, it can first be cleansed, removing any unexpected characters or flagging anything that can’t be categorized. Next, it is standardized to ensure the same abbreviations and standards are applied consistently to all elements in the same category. In this example, it could be determined that “Street” will always be abbreviated as “St”. In this case, the period after the second “St” would be removed. Address verification could then be run to verify that this address is an actual address according to local postal records. This ensures that the data is as clean as possible, establishes a foundation for ongoing data cleansing, and lays the groundwork for matching and record-linkage. It’s important to select a cleansing tool that is flexible enough to handle non name and address data such as product and material descriptions. Some tools only work with names and addresses. Cleansing tools should be able to validate and certify location data on a global basis, so Unicode support and a global reference file for location data is critical to success. While cleansing the data, you may also need to create a temporary copy in a staging database. This staging database can be used to build cleansing, matching, and cross- reference creation logic – such that heavy interactions will not impact source systems.
  10. 10. white paper 10 MATCH AND LOAD THE MASTER CROSS-REFERENCE To access this document, please return to page 1 to complete the The next step is to create a master cross-reference database. This entails applying form. matching and survivorship rules to the data in all pertinent source systems to create a database that stores the key structures of the various systems involved in each matched By completing this form once, you will have access to all similar record. documents without needing to register again. At its most basic level, the master cross reference simply defines the key structure relationships between systems for any particular customer. This cross-reference must also contain enough information on the customer to identify a positive match when new inbound data is received. This may just be name and address information, but it may also include other information, such as hierarchical information that describes the relationship of this customer to other customer entities. For example, an employer or parent may be used to help identify an individual. Creating this cross-reference is one of the most challenging aspects of customer MDM. It requires that a strong understanding of source systems is in place, and that the complex matching and survivorship rules are in place. The engine used to load the data must be capable of working through the large amounts of data in all the source systems. It also must be capable of maintaining a metadata lineage of the sources and processing of the data. Matching and linking records between systems involves identifying common elements across systems, and determining how data will be matched together. It also involves determining a precedence order for how data elements will be selected when it exists in multiple systems. Most matching products employ a deterministic matching algorithm to determine when records match. This mechanism looks for matches from multiple data sets or multiple records from within single data set using full agreement across a set of common variables (e.g. name, phone number, birth date). Some matching products employ probabilistic matching, which also considers the frequency of data values within the database when determining a match (effectively giving less common entries in the database a higher weighting – two John Smiths are less likely to match than two Zeke Durgans). Probabilistic matching is preferable in cases where a reliable and accurate identifying field is not present in all records of data, as it has been shown to produce
  11. 11. white paper 11 higher match rates and lower chances of false positives in these cases. To access this document, please return to page 1 to complete the form. Once the cross-reference database is in place, an ongoing mechanism must be created to ensure that new records do not dramatically degrade the quality of the initial load By completing this form once, you will have access to allunmatched entries. This is accomplished by packaging as by creating duplicates and similar services the same matching rules used to create the cross-reference database. This documents without needing to register again. enables the determination of whether or not new customer data actually already exists in any system. If a match is determined, the complete record can be assembled and returned. If a match is not determined, the customer is a new record, and can be entered into the systems. The matching logic can be packaged as a service that can be easily exposed and reused other applications and environments. This ensures that the logic is applied consistently, and that there is only one point of maintenance moving forward. When choosing a data quality product, the ability to create reusable services from matching logic should be considered in order to meet this requirement. In addition, the product must be able to handle high real-time processing volumes, and provide high availability to ensure that outages will not occur during critical operating hours. LOAD ANALYTICAL DATA INTO THE DATA WAREHOUSE Concurrent with loading the cross reference database, many customers also load the related analytical data into the relevant Data Warehouse for ongoing historical and trend analysis. Incremental updates to the cross-reference database and the data warehouse are a regular part of the production schedule. Unlike the cross-reference database, the data warehouse may receive full record data from the source systems, which may be interesting from an analytical perspective. For example, a customer’s purchasing habits over the past 90 days. Loading the data warehouse involves transforming the data into a structure that is optimized for analysis. Many customers choose a star schema or snowflake schema for this purpose. This requires the different dimensions of the data to be split out into separate tables. Aggregates and calculations are also often applied to the data to provide additional analytical information. These calculations are performed as the data is loaded into the data warehouse. When choosing a data integration product, the ability to load very large volumes of data within very short processing windows and trickle feed as required, should be considered.
  12. 12. white paper 12 DEFINE A DATA MODEL FOR THE SERVICES To access this document, please return to page 1 to complete the With the cross-reference database and matching services in place, the next step is to form. develop data services for your applications. Liquid Data uses a model based approach to develop data services. A model helps you to create and maintain data services in an By completing this form once, you will have access to all similar organized fashion. In this approach all the complexity of data access, such as transforms, documents without needing to register again. data integration, validation rules, caching, and security is hidden behind the data model – which creates breakthrough productivity for application development. The data model in Liquid Data is defined based on the application’s data requirements. The data model is mapped to the underlying physical sources. The data model can encompass the underlying operational data sources, analytical data sources and other XML and non relational sources. It uses the cross-reference information to link the various sources. The data model can be used to get data from single or multiple sources. Hence all the complexity of accessing or updating the data is hidden from the developers. The mapping in the data model can be simple or complex. It can map to physical data sources and also to services such as IBM’s matching engine. For example, in order to get an accurate response when identification data is either unreliable or not provided, the mapping in the data model is defined to call out to the matching service to obtain a definitive match, or a list of probable candidates. Similarly, the data model mapping may also call out to a service to get the survivorship information for the data model that has multiple similar potential sources. CREATE YOUR DATA SERVICE REQUESTS With the logical data model in place, the application developer can write data services across the virtual data source. The logical data model hides all the complexity of different types of sources, different API’s, matching service, survivorship rules, and validation rules from the application developer. The application developer defines the services against the one virtual data source, and does not need to understand the source data structures or how they relate to that view. The data services can be invoked by multiple types of applications. For Java applications, generally XML/SDO oriented approach works better, while reporting type applications may require JDBC/SQL type access to the data. Finally, the queries can also be saved as
  13. 13. white paper 13 WSDL’s for SOA centric environments. To access this document, please return to page 1 to complete the form. Once you understand the type of services your applications will use, you can configure caching strategies, security policies and validation rules on the logical data model itself. By completing this form once, you will have access to all held in memory, so that subsequent requests for the same Caching allows the data be similar data do not impact source systems. Inmost cases, the size and refresh rate of the cache documents without needing to register again. is configurable. Security allows you to control who can get access to what type of data. Generally, you will need ability to specify security by data source, and user. In some cases of sensitive or financial information, you may implement query level or field level security policy. Validation rules on the logical data model allow you to specify valid updates and hence make sure only “good” data goes back into your source systems. For creating the data services, the key issue is performance tuning and debugging. Understanding the performance characteristic of a service request that spans multiple sources and optimizing its execution path requires rich tooling. CREATE UPDATE SERVICES When any event occurs that creates or updates customer information, these changes need to be reflected in source systems and in the cross-reference database. The logic for updating this information across systems can be very complex. It involves an understanding of the important data elements across systems, along with the mapping rules. The design of these processes can leverage metadata and business rules discovered during profiling to jump start the development process. In addition, these update processes will likely reuse the existing matching services to ensure that they are not creating duplicate records. These update processes can be published as services that are callable from any other process or application. This ensures that these common business rules will be shared from project to project rather than re-created. The result is a higher level of consistency across all processes. Like the query services, these processes need to be secured, to ensure that only entitled resources can call them.
  14. 14. white paper 14 JOINT SOLUTION BY IBM AND BEA SYSTEMS To access this document, please return to page 1 to complete the form. BEA and IBM are working together on a service-oriented architecture for federated Master Customer Data Management. By completing this form once, you this joint solution, shown in the picture below, IBM® WebSphere® ProfileStage™ is In will have access to all similar documents without needing to register again. the data sources, IBM® WebSphere® QualityStage™ is used to cleanse, used to profile standardize, and match the data, and WebSphere DataStage is used to create and load the cross-reference database and the data warehouse. BEA Liquid Data Liquid Data is used to create a logical model spanning all underlying data sources, and define composite data services for the applications. All the matching logic and cross-reference keys generated in the IBM products can be exposed in BEA Liquid Data. The combined approach gives you a services oriented approach to federated master customer data management.
  15. 15. white paper 15 COMPANIES MUST MANAGE MASTER CUSTOMER DATA AS AN ASSET To access this document, please return to page 1 to complete the Master data management initiatives are significant undertakings for most companies form. because so much investment has gone into creating and maintaining separate instances By completing this form once, you reference customerto all similarmany processes are linked to this data. The only of will have access data, and so documents without needing to register meet the challenges of providing complete, accurate, and current data that is way to again. accessible to enterprise processes is to take a comprehensive approach to managing master data. In this paper, we have described a comprehensive service-oriented architecture and approach that we see applicable to most Fortune 500 companies. The advantages of this approach are: • Our initial customer successes indicate that this approach is up to 10X cheaper than approaches based on replicating the data. Replicating the data is expensive due to cost of data migration and synchronization. Further, this approach reduces the fundamental limitations of replication such as latency, and inconsistency issues. • Ongoing data quality issues are resolved as updates are always consistently applied across multiple sources; and • The shared services-oriented approach allows for reuse in the enterprise. It is not a problem that every application developer has to solve again and again.
  16. 16. white paper 16 CONCLUSION To access this document, please return to page 1 to complete the form. Customer data is the lifeblood of a business. A complete and accurate understanding of this data across systems is vital to providing top-tier customer service and maximizing revenue opportunities. Using a federated approach, achieving this requires a master data By completing this form once, you will have access to all similar management strategy that involves data profiling and de-duplication, the creation of a documents without needing to register again. cross-reference database, and the creation of a virtual queryable view across multiple source data systems. The ideal architecture for this is a service oriented architecture, where shared data services provide consistent access and update of data across multiple systems in real time. The benefits of implementing this approach are enormous, allowing any new or existing application, process, or user to get a complete and accurate view of any customer at any time. The quality processes embedded in the design provide ongoing assurance of the validity of the data, and the federated approach reduces the risk, latency, and cost of replicating data across databases. About Ascential Software Ascential Software Corporation, an IBM company, is the leader in enterprise information integration. Customers and partners worldwide use the IBM Websphere Enterprise Integration Suite™ to confidently transform data into accurate, reliable and complete business information to improve operational performance and decision-making across every critical business dimension. Our comprehensive end-to-end solutions provide on demand information integration complemented by our professional services, industry expertise, and methodologies. Ascential Software is headquartered in Westboro, Mass., and has customers and partners globally across such industries as financial services and banking, insurance, healthcare, retail, manufacturing, consumer packaged goods, telecommunications and government. For more information call 1-800-966-9875 (508-366- 3888 if calling from outside the US or Canada ) or visit the Ascential Software website at © 2005 Ascential Software Corporation., an IBM company. All rights reserved. Ascential and Ascential DataStage are trademarks of Ascential Software Corporation or its affiliates and may be registered in the United States or other jurisdictions. IBM, WebSphere, WebSphere Data Integration Suite, and WebSphere DataStage are trademarks of International Business Machines Corporation. Other marks are the property of the owners of those marks. 50 Washington Street Westboro, MA 01581 800.966.9875, Option 2 508.366.3888