How to Use the Right Tools for
Operational Data Integration
Mark R. Madsen – March, 2009
http://ThirdNature.net




      ...
What We’re Asked For




                  (simulation)
                                      Slide 2
March 2009          ...
How It Makes Us Feel




                                      Slide 3
March 2009           Mark R. Madsen
How We Want to Feel




                                     Slide 4
March 2009          Mark R. Madsen
Spending Priorities in IT




    In 2007 and 2008 this is where the money went…
       but you can’t do most of these wit...
Technology Priorities in IT




  Data integration moved up to #3 spot for CIOs in 2008

Sources: CIO Insight
            ...
The Cost Problem Management Reacts To




                                      Source: IDC

                             ...
Where We Often Are Today: Point to Point

                                                                   Typical scena...
The Desired Future State
                                                  “Data as a platform” provides:
                ...
Application versus Data Integration
    Application               Data Integration
    Integration
    Managing the flow o...
Analytic versus Operation Data Integration

   Analytic                                 Operational
   Most of a BI projec...
Architectural Models for Data Integration



      Physical




 Data
 Access
 Model



         Virtual




             ...
Consolidation
    Common operational DI scenarios
    where this model is appropriate:
        • Migrations
        • Upgr...
Propagation
  Common scenarios:
        • Copying data that can’t be accessed
          directly / remotely
        • Sync...
Federation
    Common scenarios:
      • Real-time / low latency data access
      • Security / regulatory requirements th...
Choosing Models
                      There are some basic criteria
                      and tradeoffs to consider:
     ...
A Handy Comparison Chart

                                                         Consolidation Model
                   ...
Three Implementation Choices

    • Write code! It’s fun! It’s easy! At first.
    • Buy proprietary data integration tool...
Hand-coded Integration
    Why is this so common?
      •   DI is an afterthought on application projects
      •   It’s j...
Buying Data Integration Tools
                       Buying is the usual alternative,
                       mostly ETL to...
Use of Tools vs. Hand Coding
               High Use        Medium Use                    Low Use             None
  60%

...
Open Source: End of Buy vs. Build
                     Open source avoids the pitfalls
                     of coding and ...
Benefits Reported
    After your organization adopted open source
    software, what was the primary benefit of its use?
 ...
A Side Benefit of Flexibility
    Comparison of time taken to evaluate tools




                                         ...
Recommendations
1. Differentiate between analytic
   data integration and operational
   data integration
2. Stop hand-cod...
Creative Commons
    Thanks to the people who made their images available via creative commons:
    red pill blue pill - h...
Thanks




                              Slide 27
March 2009   Mark R. Madsen
Creative Commons
    This work is licensed under the Creative Commons
    Attribution-Noncommercial-No Derivative Works 3....
Upcoming SlideShare
Loading in …5
×

How to Use the Right Tools for Operational Data Integration

5,611 views

Published on

Webcast on data integration outside the data warehouse in operational contexts and how open source fits in this area.

If you want to download the slides or listen to a replay you can find this talk under "How to Use the Right Tools for Operational Data Integration" at http://www.talend.com/webinar/archive/

Detailed Description:
Data integration tools were once used solely in support of data warehousing, but that has been changing over the past few years. The fastest growing area today for data integration is outside the data warehouse, whether it's one-time data movement for migrations and consolidations or real-time data synchronization for master data management projects.

Data integration tools have proven to be faster, more flexible and more cost effective for operational data integration than the common practice of hand-coding or using application integration technologies. The developer focus of these technologies also makes them a prime target for open source commoditization.

During the presentaiton you will learn about the differences between analytical and operational data integration, technology patterns and options, and recommendations for how to begin using tools for operational data integration.

During this presentation you will learn:
- How to map common project scenarios to integration architectures and tools
- The technology and market changes that favor use of tools for operational data integration
- The differing requirements for operational vs. analytic data integration
- Advantages of open source for data integration tasks

Published in: Technology

How to Use the Right Tools for Operational Data Integration

  1. 1. How to Use the Right Tools for Operational Data Integration Mark R. Madsen – March, 2009 http://ThirdNature.net Attribution-NonCommercial-No Derivative http://creativecommons.org/licenses/by-nc-nd/3.0/us/
  2. 2. What We’re Asked For (simulation) Slide 2 March 2009 Mark R. Madsen
  3. 3. How It Makes Us Feel Slide 3 March 2009 Mark R. Madsen
  4. 4. How We Want to Feel Slide 4 March 2009 Mark R. Madsen
  5. 5. Spending Priorities in IT In 2007 and 2008 this is where the money went… but you can’t do most of these without data integration. Sources: CIO Insight Slide 5 March 2009 Mark R. Madsen
  6. 6. Technology Priorities in IT Data integration moved up to #3 spot for CIOs in 2008 Sources: CIO Insight Slide 6 March 2009 Mark R. Madsen
  7. 7. The Cost Problem Management Reacts To Source: IDC Slide 7 March 2009 Mark R. Madsen
  8. 8. Where We Often Are Today: Point to Point Typical scenario: • Disparate data • Heterogeneous sources • Point integration • Minimal reuse • No tools Databases Documents Flat Files XML Services ERP Applications Source Environments Slide 8 March 2009 Mark R. Madsen
  9. 9. The Desired Future State “Data as a platform” provides: • Standards-based interfaces • Single views of disparate source data • Single point of access / integration • Reuse of data …but you can’t achieve this by Data Platform writing more application code Databases Documents Flat Files XML Services ERP Applications Source Environments Slide 9 March 2009 Mark R. Madsen
  10. 10. Application versus Data Integration Application Data Integration Integration Managing the flow of Managing the flow of events data and access Standardizes the Standardizes the data transaction or service Tools abstract the Tools abstract the transport and system transport, system, endpoints representation and manipulation Must write code at Data structure, format endpoints to manipulate and manipulation is data abstracted Focus on code - data as Focus on data - data as a byproduct the product Reusable functions, not Reusable data, not data functions Slide 10 March 2009 Mark R. Madsen
  11. 11. Analytic versus Operation Data Integration Analytic Operational Most of a BI project’s effort is Most of an application project spent on data integration is focused on features, not DI Many disparate sources One or a few sources Generally unidirectional One-way or bidirectional Large data volumes Large data volume for some, small volume for others Usually loaded daily Often loaded more often, varies based on project type Low concurrency Low to high concurrency High latency Low to high latency Slide 11 March 2009 Mark R. Madsen
  12. 12. Architectural Models for Data Integration Physical Data Access Model Virtual Distributed Centralized Control Slide 12 March 2009 Mark R. Madsen
  13. 13. Consolidation Common operational DI scenarios where this model is appropriate: • Migrations • Upgrades • Consolidations • Managing master / reference data Characteristics: • Large data volumes to move or access • One time data movement • Usually unidirectional • Transformation or cleansing required Slide 13 March 2009 Mark R. Madsen
  14. 14. Propagation Common scenarios: • Copying data that can’t be accessed directly / remotely • Synchronizing data • Data cross-referencing • Infrequent / one-time extracts Characteristics: • Can be one-way or bi-directional • Often repetitive data movement • Medium to large data volume (but not always) Slide 14 March 2009 Mark R. Madsen
  15. 15. Federation Common scenarios: • Real-time / low latency data access • Security / regulatory requirements that prevent copying data • Impractical to create a central database (e.g. # sources, latency) • Centralized data services Characteristics: • One-way • Lower data volumes • Higher concurrency Slide 15 March 2009 Mark R. Madsen
  16. 16. Choosing Models There are some basic criteria and tradeoffs to consider: • Data currency vs. latency • Diversity of data sources • Data cleansing & transformation • Predictability of performance • Access to the same data is needed via different interfaces • Non-relational sources • Frequency of access • Data volumes • And more… Slide 16 March 2009 Mark R. Madsen
  17. 17. A Handy Comparison Chart Consolidation Model Criteria Physical Virtual Data currency Query performance / latency Frequency of access Diversity of data sources Diversity of data types Non-relational data sources Transformation and cleansing Predictability of performance Multiple interfaces to same data Large query / data volume Need for history / aggregation Slide 17 March 2009 Mark R. Madsen
  18. 18. Three Implementation Choices • Write code! It’s fun! It’s easy! At first. • Buy proprietary data integration tools • Use available open source tools Slide 18 March 2009 Mark R. Madsen
  19. 19. Hand-coded Integration Why is this so common? • DI is an afterthought on application projects • It’s just data • It’s hard to justify expensive tools for ODI • Developers and DBAs don’t talk The market is changing: • Lower tolerance for the high cost of custom DI development and maintenance • External data challenges • Bad fit for consolidation projects Products get better over time. Hand-written code gets worse. Slide 19 March 2009 Mark R. Madsen
  20. 20. Buying Data Integration Tools Buying is the usual alternative, mostly ETL tools. • ETL vendors are branching out • Many companies have ETL for BI But… • Poor fit for propagation and synchronization tasks • Centralized servers • Licensing costs / problems for consolidation tasks or broad use Integration code is single-purpose, tools are multi-purpose. You should always go with tools – when you can afford them. Slide 20 March 2009 Mark R. Madsen
  21. 21. Use of Tools vs. Hand Coding High Use Medium Use Low Use None 60% 50% 40% 30% 20% 10% 0% ETL EDR EII EAI ETL EDR EII EAI ETL EDR EII EAI ETL EDR EII EAI Source: TDWI, 2006 Slide 21 March 2009 Mark R. Madsen
  22. 22. Open Source: End of Buy vs. Build Open source avoids the pitfalls of coding and gains the advantages of using tools. • Tools can be distributed with little to no license restrictions • Application projects budget for features, not glue • Even basic tools have obvious operational advantages over hand-coding Why build custom code when there are comparable tools available? Slide 22 March 2009 Mark R. Madsen
  23. 23. Benefits Reported After your organization adopted open source software, what was the primary benefit of its use? Flexibility 31% Lower cost 31% Reduced dependence on vendors 15% Performance 10% Reliability 7% Security 4% Other 3% Source: The 451 Group Slide 23 March 2009 Mark R. Madsen
  24. 24. A Side Benefit of Flexibility Comparison of time taken to evaluate tools Source: Yankee Group Slide 24 March 2009 Mark R. Madsen
  25. 25. Recommendations 1. Differentiate between analytic data integration and operational data integration 2. Stop hand-coding unless the problem really is trivial, and this includes table replication and DBA SQL scripts 3. Use the right data integration model for the problem 4. Augment existing data integration infrastructure with open source 5. Make open source the default option for data integration tools Slide 25 March 2009 Mark R. Madsen
  26. 26. Creative Commons Thanks to the people who made their images available via creative commons: red pill blue pill - http://www.flickr.com/photos/rcrowley/2540057217/ red pill blue pill2 - http://www.flickr.com/photos/thomasthomas/258931782/ happy dog jumping in meadow - http://flickr.com/photos/cenz/16128560/ Writing code – http://flickr.com/photos/cdm/72250667/ Woodworking – http://flickr.com/photos/rigoletto/126367565/ Febo – http://flickr.com/photos/jshyun/1573065713/ open_air_market_bologn - http://flickr.com/photos/pattchi/181259150/ Slide 26 March 2009 Mark R. Madsen
  27. 27. Thanks Slide 27 March 2009 Mark R. Madsen
  28. 28. Creative Commons This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Slide 28 March 2009 Mark R. Madsen

×