Data Quality And Soa

2,011 views
1,912 views

Published on

Data Quality meets SOA – Making Data Quality available for all Business Processes.

Data quality functions were already being provided as services for Unix, Windows and
Linux, before the analysts of Gartner had invented the term SOA. For the most part, technical reasons were the decisive factor for this architecture. In addition to this, the implementation of service-oriented architectures results in new and changed requirements for data quality services and also increases the opportunities and benefits which they can create.

Published in: Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,011
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
66
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Data Quality And Soa

  1. 1. White Paper Data Quality meets SOA – Making Data Quality available for all Business Processes Data quality functions were already being provided as services for Unix, Windows and Linux, before the analysts of Gartner had invented the term SOA. For the most part, technical reasons were the decisive factor for this architecture. In addition to this, the implementation of service-oriented architectures results in new and changed requirements for data quality services and also increases the opportunities and benefits which they can create. All company and product names and logos used in this document are trade names and/or registered trademarks of the respective com- panies. Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com © Copyright Uniserv • Pforzheim • All rights reserved.
  2. 2. White Paper The starting point for data quality services In order to consider the importance of service-oriented architectures for the provision and use of data quality functions, it is first of all useful to look at the typical data quality functions themselves. a A classic application is the validation of a postal address on the basis of reference data, which includes street and place names and the depen- b Another application is searching for duplicates in the in-house database. Here too the goal is to quickly and reliably identify a business object, a dencies of the postcodes on places, streets and business partner, a product or a sales opportuni- house numbers. In contrast to simple database ty in spite of incomplete, divergent or incorrect access, the correct address should also be found input, in order to (a) simplify the search and there- here if the input is incomplete or contains recor- by increase the productivity of the users, and (b) ding or hearing errors. The goal is a high mat- to prevent the creation of duplicates, i.e. multiple ching accuracy, in order to be able to correct the entries which refer to the same object in the real greatest possible number of incorrect addresses world. Consistent, complete and unambiguous automatically. mapping of the real objects in the database will be achieved as a result. An important goal in the context of improving the data quality consists in preventing incomplete or incorrect data from being stored in the database. Possible problems should be detected at data entry and then cleaned up either automatically or manu- ally by the user after appropriate feedback. Specialized search indices ensure that a search in databases with 1,000,000 to 100,000,000 data records normally only requires a fraction of a second, even with divergent spelling. Nevertheless, these response times require intelligent caching of the indices. This can be provided very efficiently by implementing the software as a central service which is made available from an in-house server. Apart from the response time, the integration in a wide range of environments already played an important role in data quality services. Decoupling the data quality services from the service consumers and utilization via a client/server protocol is important for reaching this goal. This function had therefore already been provided and used for data quality, at least for more sophistica- ted applications, before the invention of service-oriented architectures as a service. As a result, both the high requirements for the response times could be met and the provision of functions guaranteed for a wide range of environments. Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/ Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com © Copyright Uniserv • Pforzheim • All rights reserved.
  3. 3. White Paper From the fat client to the 3-layer architecture LAYERED ARCHITECTURE FAT CLIENT CLIENT ROLE ROLE Name Customer Name Customer Street Supplier Street Supplier Postcode Reseller Postcode Reseller APPLICATION SERVER The typical architecture in the early days of the client/server world consisted of a database which, in addition to the storage of transaction and master data, enabled asynchronous communication between different system components, thereby allowing these components to be decoupled. Messages were written in the database by the sender and read there by the receiver. For this purpose, however, regular polling of the respective table was required, in order to establish whether new unprocessed messages were available. The business logic was mainly implemented in so-called fat clients. Tasks which were executed via batch processes were implemented via background processes which accessed the database. Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/ Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com 3 © Copyright Uniserv • Pforzheim • All rights reserved.
  4. 4. White Paper Interactive functions for the validation of addresses or for the detection of duplicates directly at data entry were normally integrated in the graphical user interface. They were usually called via proprietary interfaces. Specifications such as DCE1 or CORBA 2, whose goal was the standardization of interfaces for the communication of distributed components, had nothing more than a niche existence. This situation has fundamentally changed in the past few years. The starting point for this was mainly the establishment of standards within the framework of JEE3 (Java Enterprise Edition) which resulted in the provision of high-performance implemen- tations of these standards both as commercial products and open source solutions. For the Windows world, Microsoft followed with the development of .NET4 as a language-independent platform and the .NET Enterprise Services. As a result, a high-performance infrastructure software was available irrespective of the selected platform (JEE, .NET), in order to largely detach the business logic from the presentation layer and implement it in its own layer on the server. This also changed the requirements for data quality services, which were now executed mainly from the business logic on the server side. Simple integration in the application server – either with a JEE or a .NET architecture – now came to the fore. 1 http://en.wikipedia.org/wiki/Distributed_Computing_Environment 3 http://en.wikipedia.org/wiki/Java_Platform,_Enterprise_Edition 2 http://en.wikipedia.org/wiki/CORBA 4 http://en.wikipedia.org/wiki/.NET_Framework Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/ Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com © Copyright Uniserv • Pforzheim • All rights reserved.
  5. 5. White Paper SOAP as an Enabler for SOA SOA does not become really effective until a standard SOAP-based Web Services have therefore developed into a protocol for the provision and use of services has been central instrument for the provision of interactive data qua- established. This gap was closed by the Web Service pro- lity services in modern enterprise architectures. These main- tocol SOAP5. It is supported by practically all middleware ly concern services which run according to the request/ and infrastructure components, thereby enabling interope- response pattern, in which the service consumer initiates a rability between service providers, middleware and service request, e.g. validate the specified address, and the ser- consumers. As a result, the provision of connectors for the vice makes a direct response with a confirmation, a correc- use of proprietary protocols in proprietary middleware is tion suggestion or a selection of possible correct addresses. no longer necessary. This therefore lays the basis for the This procedure corresponds to the interactive character of establishment of powerful middleware components. It inclu- the validation, which should support the user directly at des the concept of the Enterprise Service Bus6 (ESB), which entry of a business object and offer options for intervention enables the loose coupling of different components which in the event of problems. However, this function is no longer play a significant role in the routing of messages, as well implemented in isolation in the presentation layer but nor- as engines which can directly execute defined workflows mally takes place in the context of a higher-ranking business (BPEL)7 in a business process language. process, e.g. the implementation of the ordering process in an e-business application, the implementation of a process for lead conversion and qualification in a CRM application or a comparable process. The correlation between the implementation of business processes and data quality func- tions immediately becomes obvious and therefore also the contribution which they make to the success of the respecti- ve business process. 5 http://en.wikipedia.org/wiki/SOAP 6 http://en.wikipedia.org/wiki/Enterprise_service_bus 7 http://en.wikipedia.org/wiki/BPEL Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/ Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com © Copyright Uniserv • Pforzheim • All rights reserved.
  6. 6. White Paper Customer Cases Orange The customers of the telecommunications company Orange in France can contact the compa- ny via various channels. They can visit the web portal of Orange in the Internet, call the call center of Orange and visit the mobile phone business of a partner of Orange. However, irre- spective of which contact channel the customer chooses, it must be ensured that the respective processes are executed with the same quality. An important component of this process quality consists in ensuring that customer addresses are correct. The technical basis is the open source Java application server JONAS. This JEE server is the central point for the provision of services which are required to implement the business processes of Orange. The Uniserv data quality service for the validation, restructuring and standardization of customer addresses is also provided in this environment. This service-orien- ted approach makes it possible to use the same services in different processes and different channels. Irrespective of whether it concerns the creation of a new Orange customer or the change of address of an existing customer and irrespective of which contact channel a pro- cess is initiated, the underlying service-oriented architecture always ensures that the executed processes are configured consistently and can access the same services. It is thereby possible to guarantee a consistently high quality standard of the address data across the company. WinGroup AG The German WinGroup AG, a service network for sales and marketing, offers its customers an extensive range of services in the areas of call center, lettershop, dialog marketing and IT services. In order to guarantee a consistently high quality level of the underlying processes in all service areas and customer-specific applications, the subsidiary company, WinLogic, has developed a service-oriented architecture based on an Enterprise Service Bus. This represents the central middleware for docking all applications in the company in the central services. The address validation and the duplicate check of Uniserv are linked here to secure the data quality. Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/ Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com © Copyright Uniserv • Pforzheim • All rights reserved.
  7. 7. White Paper Lightweight REST – when SOAP is too unwieldy Even if the SOAP-based protocol for integration in typical Although the same services used for address validation pre- enterprise middleware seems to be the ideal solution, there sent themselves for the address input in this implementation, are various applications where alternatives such as RESTful8 SOAP-based Web Services are not always suitable for this Services are advantageous. This is particularly the case application scenario on account of the overhead which the when data quality services are to be directly activated in a SOAP protocol entails. RESTful Web Services are extremely presentation layer which is HTML/AJAX-based. Input aids lean in comparison to SOAP-based Web Services. The call which automatically complete a partial input by the user is via the http protocol, and the call arguments in the URL are or offer suggestions for completion of the input based on encoded as a result. In the case of a call from JavaScript, the partial input are a typical scenario for this case. Input the result is output in the JSONformat 9 in the ideal case. This aids are located in the presentation layer by their nature. results in a JavaScript code which can be directly interpre- However, they make considerably higher demands for the ted by the JavaScript interpreter of the browser, in order to response time, since they are called more frequently during provide the result of the call directly as JavaScript objects. the input, usually after each input character. Services configured in this manner can be ideally used in typical Web 2.0 applications. SOAP - the fine differences A learning curve also has to be overcome for adapting Validation against the XML scheme using standard XML proprietary interfaces to a SOAP-based communication. means is therefore possible on the one hand, and the result The packets exchanged in the framework of a SOAP-based document can be easily further processed, transformed or communication are described in a metaformat, the so-called processed for the presentation layer using standard XML WSDL (Web Service Description Language).10 means or suitable frameworks on the other. Both variants During the development of the WSDL, it must first of all be have their advantages and disadvantages. Both variants ensured without fail that the data types used are actually may also have to be made available to services which need supported by all the target languages and systems, in which to be called from any context. the service is to be consumed. This aspect is relatively non- Web Services are stateless by their nature. In the ideal case, critical in a pure in-house development with a homogeneous this means no partial results; instead the overall result is out- software infrastructure, e.g. JEE or .NET. However, this put as a result of the call, and two successive calls are totally criterion is essential in a data quality service which must independent of each other. useable in a great variety of environments which may not Web Services must be scalable. A prerequisite for this is the even be known beforehand. described statelessness. In addition to this, the Web Service In addition to this, the SOAP specification offers two basic should access global resources as little as possible or not at options for defining in the WSDL the linkage between the all. If this is necessary, however, the administration and syn- packet structure in XML and the constructs of the program- chronization of the access to these resources should never be ming language which provides or interprets the packet. As implemented ad-hoc for the respective WEB service. Resource the acronym suggests, the so-called rpc-style corresponds pools such as are offered by most application servers or exist more to the conventional Remote Procedure Call (RPC) and as open source extensions should be used instead. Effective models operations as method calls which do not differ and configurable pooling is thereby enabled. from local calls. They are ideal when the Web Service is to The last two points in particular normally require at least a be called from an object-oriented programming language partial redesign if the existing functionality is to be made such as Java or C#. The so-called document-style is more available as a Web Service. suitable for modelling complex contents as a result which is represented as an XML document with its own XML scheme. 8 http://en.wikipedia.org/wiki/REST 10 http://en.wikipedia.org/wiki/Web_Services_Description_Language 9 http://en.wikipedia.org/wiki/JSON Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/ Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com © Copyright Uniserv • Pforzheim • All rights reserved.
  8. 8. White Paper SOA blurs the distinction between on premise and on demand The provision of software services and applications via the Internet, referred to by the buzzwords Software on Demand, Software as a Service11 or Cloud Computing, is a theme which is growing in importance. In many cases, a fundamental and profound contradiction between locally installed software (on premise) and software used via the Internet (on demand) is depicted. This is not the case from the SOA perspective, because within the framework of a service-oriented architecture it is not critical how the respective service is provided. The provision of a service via the Internet as an alternative to a locally installed server presents interesting new application possibilities, particularly in the area of data quality services which carry out validations and corrections by matching and merging against reference data. The reference data required for the service must be regularly updated. This means both regularly recurring manual work as well as regular subscription charges which have to be paid to the data provider. In the case of country-specific postcode direc- tories, this work and expense are incurred for each country for which addresses are checked. This only makes the use of such solutions interesting for larger quantities of data. This restriction is not applicable if the service is provided as an SaaS offer, in which invoicing is based exclusively on the executed transactions. Locally installed services and services used via the Internet can be also combined as required or exchanged by using them in a service-oriented architecture. As a result, an optimum solution, which can also be flexibly adapted in retrospect to changed basic conditions, can be found for the respective user. 11 http://en.wikipedia.org/wiki/Software_as_a_Service Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/ Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com © Copyright Uniserv • Pforzheim • All rights reserved.
  9. 9. White Paper Conclusions The following conclusions about two aspects can be drawn from the above described experience: What are the requirements for data quality functions Which aspects have to be considered if functions for 1 which are to be used in an SOA environment? 2 the validation, enhancement and processing of data are to be made SOA-capable? The functions must be accessible via SOAP as The use scenarios must be clearly defined. The que- Services. Integration in practically any infrastructure stion as to whether the application of the services for the service-oriented implementation of the busi- takes place within the framework of the business ness processes is thereby guaranteed. However, care logic or the presentation logic is particularly impor- should be taken in the detail that mapping between tant. In the first case, integration takes place in the the XML elements of the service description and the application server, the Enterprise Services Bus or a respective constructs of the respective server environ- BPEL engine, in the second case in a graphical user ment can be represented. interface, which can in turn make its own demands. The decision as to whether SOAP-based and/or If use in the presentation layer is foreseeable, it must RESTful services are more appropriate is made be checked whether a RESTful Service implementati- depending on this. on is necessary. The target systems and languages in which the ser- It should be checked whether an alternative use vice is to be used must be specified. The decision scenario, in which the service is provided via the on the degree of complexity of the modelling with Internet, provides commercial or technical advan- respect to the XML structures used and XML data tages, and whether the service provider supports types of the call results is made depending on this. such a scenario. The service must be designed, so that it is stateless, i.e. it must function without the storage of an internal state between two calls. If a state between the calls is required, it must be transferred for the follow-up call or made persistent in a suitable manner. The scalability of the service must be provided: if the Web Service requires global resources, e.g. a database connection, these should be administered by means of a suitable resource pool in the server container. Otherwise, these resources quickly beco- me a bottleneck which prevents genuine scalability of the service. The service should be internet-capable, i.e. it should be irrelevant for the functionality of the service whether it is provided in the local network or via the Internet. The possible applications are extended enormously as a result. Uniserv GmbH • Rastatter Str. 13 • 75179 Pforzheim/ Germany T +49 7231 936-0 • F +49 7231 936-3002 • E info@uniserv.com • www.uniserv.com © Copyright Uniserv • Pforzheim • All rights reserved.

×