Managing Data Integration Initiatives


Published on

This powerpoint slide deck is the presentation given at the Microsoft center in Waltham, MA titled Leading Practices and Insights for Managing Data Integration Initiatives.

Topics covered include:
Key Drivers
Approaches and Strategy
Tools and Products
Useful Case Studies
Success Factors

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 3
  • 3
  • Bulk Extract – utilizes copy management tools or unload utilities to extract all or a subset of the operational relational database. The data which has been extracted may be transformed to the format used by the target on the host or target server . The DBMS system load tools are then used in order to refresh the database target. File Compare – process compares the newly extracted operational data to the previous version. After that, a set of incremental change records is created and are applied as updates to the target server within the scheduled process. Change Data Propagation – captures and records the changes to the file as part of the application change process. Techniques that can be used include triggers, log exits, log post processing or DBMS extensions. A file of incremental changes is created to contain the captured changes.
  • Data stewardship involves taking responsibility for data elements for their end-to-end usage across the enterprise .
  • Managing Data Integration Initiatives

    1. 1. Leading Practices and Insights for Managing Data Integration Initiatives May 7, 2010
    2. 2. Agenda <ul><li>Introductions </li></ul><ul><li>Overview </li></ul><ul><li>Key Drivers </li></ul><ul><li>Approaches / Strategy </li></ul><ul><li>Tools </li></ul><ul><li>Case Studies </li></ul><ul><li>Success Factors / Lessons Learned </li></ul><ul><li>Q&A </li></ul>03/09/11
    3. 3. About Allin <ul><li>Wakefield, MA, based service provider of enterprise-quality solutions and services to small and large companies </li></ul><ul><li>Consulting practice provides technical and business expertise that helps businesses implement strategic solutions, integrate key functions, and extend technical capability. Services include: </li></ul><ul><ul><ul><li>Integrative Application Development and Performance Tuning </li></ul></ul></ul><ul><ul><ul><li>Systems Integration and Management </li></ul></ul></ul><ul><ul><ul><li>Data Migration to comprehensive Design Services . </li></ul></ul></ul><ul><li>Solution focused in two key areas : </li></ul><ul><ul><ul><li>Microsoft® SharePoint® </li></ul></ul></ul><ul><ul><ul><li>Virtualization software from Microsoft® and VMware® </li></ul></ul></ul><ul><li>Mark Bramhall, CTO </li></ul><ul><ul><li>Over 35 years of experience working with small and large firms </li></ul></ul><ul><ul><li>Designed and implemented numerous data integration solutions </li></ul></ul><ul><ul><li>Experienced technologist and integrator </li></ul></ul>03/09/11
    4. 4. About Optima <ul><li>Middleton, MA, based technology consulting firm providing IT leadership for both strategic and tactical projects </li></ul><ul><li>Consulting practice specializes in advising small- to mid-sized firms on optimizing their technology investments and improving business processing. Key services include: </li></ul><ul><ul><ul><li>Application Implementations </li></ul></ul></ul><ul><ul><ul><li>Data Integrations </li></ul></ul></ul><ul><ul><ul><li>Technology Optimization </li></ul></ul></ul><ul><ul><ul><li>Business Process Transformation </li></ul></ul></ul><ul><ul><li>Irving Burday, President </li></ul></ul><ul><ul><ul><ul><li>Experienced technology leader (CIO at several companies) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Led several data integration and data warehouse projects </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Directed / managed many complex integration efforts </li></ul></ul></ul></ul>03/09/11
    5. 5. Data Integration Overview 03/09/11 Definition - combining data residing in different sources and providing users with a unified view of these data Mediated Schema Example Data Warehouse Example
    6. 6. Business / Technical Drivers <ul><li>Application Integrations </li></ul><ul><ul><li>Legacy to new system migration and conversion </li></ul></ul><ul><ul><li>Legacy system feeds to new system </li></ul></ul><ul><ul><li>New application / Web site processing </li></ul></ul><ul><ul><li>Retirement of legacy application </li></ul></ul><ul><li>Business Intelligence / Analytics </li></ul><ul><ul><ul><li>Data aggregation for data mining </li></ul></ul></ul><ul><ul><ul><li>Supporting predictive models and analytics </li></ul></ul></ul><ul><ul><ul><li>Feeding decision support and other business intelligence needs </li></ul></ul></ul>03/09/11
    7. 7. Business / Technical Drivers <ul><li>Data Architecture Improvements </li></ul><ul><ul><li>Managing feeds to/from Data Warehouse </li></ul></ul><ul><ul><li>Reorganization of Operational Data Stores and Marts </li></ul></ul><ul><ul><li>Coordinating feeds to/from Legacy systems </li></ul></ul><ul><ul><li>Coordinating feeds to/from External entities </li></ul></ul><ul><li>External Factors </li></ul><ul><ul><li>Supporting Acquisitions / Divestitures </li></ul></ul><ul><ul><li>Managing Mergers </li></ul></ul><ul><ul><li>Facilitating new Channels and/or Sales Growth </li></ul></ul>03/09/11
    8. 8. Key Variables / Considerations 03/09/11 <ul><li>Business Related </li></ul><ul><ul><li>Scale of Effort </li></ul></ul><ul><ul><li>Availability / Skills of Resources </li></ul></ul><ul><ul><li>Funding </li></ul></ul><ul><ul><li>Timing </li></ul></ul><ul><li>Technical Factors </li></ul><ul><li>Platforms and operating systems </li></ul><ul><li>Data management software </li></ul><ul><li>Data models, schemas, and data semantics </li></ul><ul><li>Middleware </li></ul><ul><li>User interfaces </li></ul><ul><ul><li>Frequency of integrations </li></ul></ul><ul><li>Business rules and integrity constraints </li></ul>
    9. 9. Approaches / Strategies 03/09/11
    10. 10. 03/09/11 <ul><li>Integration approach depends on architectural level </li></ul>Approaches / Strategies
    11. 11. 03/09/11 <ul><li>Several strategies for integrating data: </li></ul><ul><li>Manual Integration – users directly interact with all relevant information systems and manually integrate data. Requires users to have detailed knowledge of logical data representation and data semantics as well as dealing with different interfaces and query languages. </li></ul><ul><li>Application Specific – modifying applications or layering application specific code around an application to enable it to take data from or give data to external data stores. </li></ul><ul><li>Data Propagation - involves replicating data in different locations from different sources. Technologies include replication, database log scrapers and change data capture software. </li></ul><ul><li>Data Federation - enables a single unified virtual view of one or more source data files. Data federation technique normally employs a metadata reference file to connect related customer information together based on a common key. </li></ul>Approaches
    12. 12. Application Specific Solutions <ul><li>Application specific tools and utilities are frequently provided by vendors to integrate and manage data. Key considerations for utilizing this approach: </li></ul><ul><li>Developing and using requires deep system knowledge </li></ul><ul><li>Best results for special-purpose applications </li></ul><ul><ul><li>A new data source requires new code to be written </li></ul></ul><ul><ul><li>Usually optimal for one-time conversions / migrations </li></ul></ul><ul><ul><ul><li>Data cleanup frequently requires multiple human interventions </li></ul></ul></ul><ul><ul><ul><li>Fragile if updating / changing the underlying data sources (may affect the application) </li></ul></ul></ul><ul><ul><ul><li>Can be expensive - in terms of time and skills </li></ul></ul></ul>
    13. 13. 03/09/11 Data Propagation <ul><li>Data Propagation is the distribution of data from one or more source data warehouses to one or more local access databases. Data Propagation methods include: </li></ul><ul><ul><li>Bulk Extract – utilizes copy management tools or unload utilities to extract all or a subset of the operational relational database. The data which has been extracted may be transformed to the format used by the target on the host or target server . </li></ul></ul><ul><ul><li>File Compare – process compares the newly extracted operational data to the previous version. After that, a set of incremental change records is created and are applied. </li></ul></ul><ul><ul><li>Change Data Propagation – captures and records the changes to the file as part of the application change process. Techniques include triggers, log exits, or DBMS extensions. A file of incremental changes is created to contain the captured changes. </li></ul></ul>
    14. 14. Data Federation <ul><li>Links data from two or more physically different locations, making the access/linkage appear transparent as if the data was co-located (versus a data warehouse that houses data in one location). Key elements of a data federated approach: </li></ul><ul><li>Middleware consisting of database management system </li></ul><ul><li>Uniform access to number of heterogeneous data sources </li></ul><ul><li>Provides query language used to combine, contrast, analyze and manipulate the data </li></ul><ul><li>Data integration can be done through database integration </li></ul><ul><li>Combine data from multiple sources with a single SQL statement </li></ul><ul><li>Create a master system that relates data elements from all line of business systems </li></ul>
    15. 15. Data Integration Tools 03/09/11
    16. 16. 03/09/11 ETL ETL: Extract, Transform and Load <ul><li>ETL tools extract data from a chosen source(s), transform it into new formats according to business rules, and then load it into target data structure(s) </li></ul><ul><li>Enables rules and processes for managing diverse data sources and processing of high volumes of data </li></ul><ul><li>Direct insight into source data before; provides data profiling and quality control capabilities </li></ul><ul><li>Provides ability to map physical data items with a unique metadata description or create an abstraction layer of common business definitions to map all similar data items to the same definition </li></ul>
    17. 17. ETL Integrated Architecture XYZ Corp Systems Data Extraction & Integration Business Process Layer Information Management Presentation Layer Accessible throughout the organization Distribution Ad-hoc Analysis Tools Core Business Systems <ul><li>Financial </li></ul><ul><li>Systems </li></ul><ul><li>CRM Systems </li></ul><ul><li>Business Line </li></ul><ul><li>systems </li></ul><ul><li>Other… </li></ul><ul><li>External </li></ul><ul><li>Sources </li></ul>Architecture <ul><li>Portals/Dashboards </li></ul><ul><li>Production Reports </li></ul><ul><li>Ad-hoc reports </li></ul><ul><li>Query </li></ul><ul><li>Extracts </li></ul><ul><li>Other data analysis and reporting tools </li></ul>Reporting Data Storage Transformation Subject Areas Transformation Extract, Transformation, & Load (ETL) Layer The “Unified Business Model” and Information Management Analytics and Reporting Tools OLAP Cubes & Predictive Models <ul><li>Conformed facts and shared dimensions </li></ul><ul><li>Pre-aggregated data stored in OLAP models to support variance analysis, exception reporting and drill-down </li></ul><ul><li>Real-time drill through into relational storage </li></ul><ul><li>Structured and unstructured content </li></ul>Exception Notifications Mining <ul><li>Extract, Transformation, and Load (ETL) </li></ul><ul><li>Message Broker </li></ul><ul><li>Integration of actual data </li></ul><ul><li>Metadata (structured data) management </li></ul><ul><li>Security Management </li></ul><ul><li>Administration </li></ul>
    18. 18. 03/09/11 EAI <ul><li>EAI: Enterprise Application Integration </li></ul><ul><li>An integration framework composed of a collection of technologies and services </li></ul><ul><li>A centralized broker that handles security, access, and communication </li></ul><ul><li>An independent data model based on a standard data structure (e.g., XML) </li></ul><ul><li>A connector or agent model where each vendor, application, or interface can build a single component that can speak natively to that application and communicate with the centralized broker </li></ul><ul><li>A system model that defines the APIs, data flow and rules of engagement to the system such that components can be built to interface with it in a standardized way </li></ul>
    19. 19. 03/09/11 EAI Architecture
    20. 20. 03/09/11 Tool Comparison <ul><li>ETL versus EAI – what’s the difference? </li></ul>
    21. 21. 03/09/11 ETL Tools - Key Features <ul><li>Architecture </li></ul><ul><ul><li>Parallel Processing </li></ul></ul><ul><ul><li>Scalability </li></ul></ul><ul><ul><ul><li>Job Distribution, Pipelining, Partitioning </li></ul></ul></ul><ul><ul><ul><li>Common Warehouse Model (CWM) compliant </li></ul></ul></ul><ul><ul><ul><li>Version Control </li></ul></ul></ul><ul><li>ETL Functionality </li></ul><ul><ul><li>Managing Data Streams (multiple targets, splitting) </li></ul></ul><ul><ul><li>Pivoting, De-pivoting, Unions </li></ul></ul><ul><ul><li>Lookups </li></ul></ul><ul><ul><li>Scheduling </li></ul></ul><ul><ul><li>Error Handling </li></ul></ul>Data Extraction & Integration Data Storage Transformation Transformation Extract, Transformation, & Load (ETL) Layer <ul><li>Extract, Transformation, and Load (ETL) </li></ul><ul><li>Message Broker </li></ul><ul><li>Integration of actual data </li></ul><ul><li>Metadata (structured data) management </li></ul><ul><li>Security Management </li></ul><ul><li>Administration </li></ul>
    22. 22. 03/09/11 ETL Tools – Key Features (continued) <ul><li>Reusability </li></ul><ul><ul><li>Reuse of components </li></ul></ul><ul><ul><li>Decomposition </li></ul></ul><ul><li>Debugging </li></ul><ul><ul><li>Step by step, row by row, breakpoints </li></ul></ul><ul><ul><li>Compiler / Validater </li></ul></ul><ul><li>Connectivity </li></ul><ul><ul><li>Native connections support (ODBC, OLE DB, Flat Files) </li></ul></ul><ul><ul><li>Integration to package / application meta data </li></ul></ul><ul><ul><li>Data Quality, Data Validation </li></ul></ul><ul><li>Ease-of-Use </li></ul><ul><ul><li>WYSIWYG </li></ul></ul><ul><ul><li>Documentation </li></ul></ul>Data Extraction & Integration Data Storage Transformation Transformation Extract, Transformation, & Load (ETL) Layer <ul><li>Extract, Transformation, and Load (ETL) </li></ul><ul><li>Message Broker </li></ul><ul><li>Integration of actual data </li></ul><ul><li>Metadata (structured data) management </li></ul><ul><li>Security Management </li></ul><ul><li>Administration </li></ul>
    23. 23. Challenges 03/09/11
    24. 24. 03/09/11 <ul><li>Data Preparation / Quality </li></ul><ul><ul><li>Completeness / Accuracy of Data Records </li></ul></ul><ul><ul><li>Duplicates </li></ul></ul><ul><ul><li>Half Match-able Data </li></ul></ul><ul><ul><li>Freshness of Data </li></ul></ul><ul><li>Technology Issues </li></ul><ul><ul><li>Multiple and Mixed Data Formats </li></ul></ul><ul><ul><li>Disparate Operating systems and processing platforms </li></ul></ul><ul><ul><li>Source system constraints </li></ul></ul><ul><li>Organizational </li></ul><ul><ul><li>Business and IT Politics </li></ul></ul><ul><ul><li>Ownership / Stewardship of Source Data </li></ul></ul><ul><ul><li>Dedication of IT resource to manage daily functions </li></ul></ul>Challenges
    25. 25. 03/09/11 <ul><li>Level of automation </li></ul><ul><ul><li>Time based </li></ul></ul><ul><ul><li>Event based </li></ul></ul><ul><ul><li>Frequency </li></ul></ul><ul><li>Error handling </li></ul><ul><ul><li>Reporting Requirements </li></ul></ul><ul><ul><li>Ownership of Error Remediation </li></ul></ul><ul><ul><li>Technical failures </li></ul></ul><ul><ul><li>Data failures </li></ul></ul><ul><ul><li>Auto-correction versus manual updating </li></ul></ul><ul><ul><li>Batch integrity </li></ul></ul>Challenges (continued)
    26. 26. 03/09/11 <ul><li>Data Handling </li></ul><ul><ul><li>Transformations </li></ul></ul><ul><ul><li>Manipulations </li></ul></ul><ul><ul><li>Transmission </li></ul></ul><ul><li>Magnitude of Effort </li></ul><ul><ul><li>Number of systems </li></ul></ul><ul><ul><li>Volume of data </li></ul></ul><ul><ul><li>Number of runs </li></ul></ul>Integration Challenges (continued)
    27. 27. Case Studies 03/09/11
    28. 28. Business Case 1 03/09/11 Service Company – Revenue: $200m, Size: 300 FTEs <ul><li>Client’s Business Challenge: </li></ul><ul><li>Integrating data from customer web sites / CRM systems into operational and financial systems </li></ul><ul><li>Client’s objective was to build a one-time solution to manage data migrations </li></ul><ul><li>Solution: </li></ul><ul><li>Use SSIS to develop a data migration framework that would allows transformation of data </li></ul><ul><li>Build custom stored procedure scripts to extract data from legacy applications </li></ul><ul><li>Lessons Learned: </li></ul><ul><li>Data rules and manipulations required extensive analysis and documentation in order to streamline future state process </li></ul><ul><li>Created cross tabular map of legacy application tables to facilitate data mappings and data handling procedures during conversion and testing activities </li></ul>
    29. 29. 03/09/11 03/09/11 Business Case 2 Education Company – Revenue: $500m, Size: 300 FTEs <ul><li>Client’s Business Challenge: </li></ul><ul><li>Integrating data from a set of outsourced-function partners </li></ul><ul><li>Integration needed to be real time as clients transited Web sites, but could not fail in the face of network outages, system failures, etc </li></ul><ul><li>Solution: </li></ul><ul><li>Understand who is the data master and who only keep shadow copies for each type of data </li></ul><ul><li>Design a way to uniquely identify data, even if multiple sources can create it </li></ul><ul><li>Deploy a publish / subscribe solution using reliable, persistent message queuing </li></ul><ul><li>Lessons Learned: </li></ul><ul><li>You cannot know your data too well; subtle relationships must become explicit </li></ul><ul><li>Multi-partner integration requires extremely simple interfaces and definitions </li></ul>
    30. 30. 03/09/11 Business Case 3 Healthcare Company – Revenue: $150m, Size: 200 FTEs <ul><li>Client’s Business Challenge: </li></ul><ul><li>Integrating data feeds from source systems into new data warehouse </li></ul><ul><li>Implementing a data hub to manage data feeds from external entities (e.g., customers, banks) into financial and customer support systems </li></ul><ul><li>Solution: </li></ul><ul><li>Select / implement a full featured ETL tool to manage and handle data warehouse and miscellaneous data feeds </li></ul><ul><li>Created data extracts from sources to manage data extract requirements and file formats </li></ul><ul><li>Deploy a data quality program that cleansed incoming and transferred data prior to loading into destination system </li></ul><ul><li>Lessons Learned: </li></ul><ul><li>Error handling required additional time and effort to define error cases and remediation actions </li></ul><ul><li>Data ownership required executive intervention to staff and manage data management process </li></ul>
    31. 31. Conclusions 03/09/11
    32. 32. The Value of a Data Quality Effort <ul><li>Data Remediation </li></ul><ul><ul><li>Data management processes can not allow junk data to be loaded, migrated or transported into a target system </li></ul></ul><ul><ul><li>Data remediation procedures should be designed into every solution </li></ul></ul><ul><li>Key Performance Indicators: Data Quality Compliance </li></ul><ul><ul><li>Data quality indicators should be defined and monitored at all times </li></ul></ul><ul><ul><li>The KPIs should be used by the data management operations team to manage data processing and testing </li></ul></ul><ul><ul><li>The KPIs for management must be business focused and should show how poor data quality is financially effecting their business </li></ul></ul>03/09/11
    33. 33. The Value of a Data Quality Effort <ul><li>Check Twice, Load Once </li></ul><ul><ul><li>Data should be checked for validity prior to being loaded into target </li></ul></ul><ul><ul><li>Designers and developer MUST log the exact data quality errors and issues that are present in the data being processed </li></ul></ul><ul><ul><li>The data quality errors and issues must be summarized and reported on. Reports can be used by operations and source data owners to remediate the data and drive data compliance </li></ul></ul>03/09/11
    34. 34. Importance of Governance <ul><li>Poor Governance and Lack of Communication account for over 85% of the issues in a data integration project </li></ul>Size 03/09/11 Incorrect hardware or software 7% 2% Failure to define objectives 17% Unfamiliarity with scope and complexity 17% Lack of communication 20% Inadequate Project Management 32% Other 5%
    35. 35. Data Stewardship Data stewards act as the conduit between IT and the business and accept accountability for the data management process. <ul><ul><li>Domain Values </li></ul></ul><ul><ul><li>Data Standards </li></ul></ul><ul><ul><li>Business Rule Specifications </li></ul></ul><ul><ul><li>Data Ownership Rules </li></ul></ul><ul><ul><li>Data Quality Rules </li></ul></ul><ul><ul><li>Security Requirements </li></ul></ul><ul><ul><li>Data Retention Criteria </li></ul></ul>Data Stewards play a the central role in the management of data across the organization and in assuring its usefulness for the business. 03/09/11 Data stewards become the “public face” for data and have the following responsibilities: IT Business Business Data Stewards
    36. 36. Success Factors <ul><ul><li>Establish and agree upon scope, high level requirements, expected benefits, and architecture </li></ul></ul><ul><ul><ul><li>Benefits need to be emphasized from the top down and understood from the bottom up </li></ul></ul></ul><ul><ul><li>Data integrity and Data cleansing cannot be over-emphasized </li></ul></ul><ul><ul><ul><li>Even well-documented systems are usually prone to poor data quality </li></ul></ul></ul><ul><ul><ul><li>Common definitions and mapping is crucial </li></ul></ul></ul><ul><ul><li>A complex business is not made any less complex by documenting the data and putting it in a operational store </li></ul></ul><ul><ul><ul><li>Knowledgeable use of the data will still require knowledgeable users </li></ul></ul></ul>03/09/11
    37. 37. Success Factors (continued) <ul><ul><li>Technology is only part of the answer </li></ul></ul><ul><ul><ul><li>No mater how sophisticated the implementation, significant process change will be required </li></ul></ul></ul><ul><ul><li>But, technology is key to success </li></ul></ul><ul><ul><ul><li>Having a key partner who has done this before will minimize risk </li></ul></ul></ul><ul><ul><ul><li>Much can be learned from similar efforts </li></ul></ul></ul><ul><ul><li>This effort requires a full-time dedicated set of highly-skilled resources </li></ul></ul><ul><ul><ul><li>Both technical and business knowledge are required </li></ul></ul></ul>03/09/11
    38. 38. Appendices 03/09/11
    39. 39. 03/09/11 ETL Vendors ETL Vendors ETL Tools Microsoft  SQL Server Integration Services Oracle  Oracle Warehouse Builder (OWB) SAP Business Objects Data Integrator & Data Services  IBM IBM Information Server (Datastage) IBM Data Manager/Decision Stream (Cognos) SAS Institute SAS Data Integration Studio Informatica  PowerCenter Ab Initio Co>Operating System Information Builders Data Migrator Adeptia Adeptia Integration Server CastIron Systems OmniConnect Platform Pitney Bowes Business Insight DataFlow Manager Pervasive Data Integrator Elixir Elixir Repertoire Javlin  Clover ETL Pentaho  Pentaho Data Integration Talend Talend Open Studio
    40. 40. 03/09/11 ETL / EAI - Tool Strengths ETL EAI Excels at bulk data movement Limited in data movement capabilities Provides for complex transformations, aggregation from multiple sources and sophisticated business rules. Offer less sophisticated transformation and extraction functions Assumes data delays. Operates in real time Are batch-oriented, making them fast and simple for one-time projects and testing Work better with continuously interacting systems Offers little in the way of workflow Workflow-oriented at the core Works primarily at the session layer Works primarily at the transport layer
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.