Unlocking value from data with data integration tools


Published on

Every day, consumers, businesses and not for profit organisations generate increasing volumes of data. Initiatives such as Smart Meters in the utilities sector, along with user generated 'Web 2.0' data sources and High Energy Physics are causing an exponential growth in available data. Many business seek to take advantage of this data to analyse business performance or understand trends in customer or prospect behaviour.

This analytical data often requires looking at very high volume, complex data sources. To bring this together in a format that is easy for analysts to understand and query is often very challenging - particularly for businesses when business requirements for this data change and a rapid response can mean the difference between profit and loss.

This is just one of many areas that Data Integration tools and technologies are being applied - providing the 'plumbing' from a source system to a target system. DI tools are designed to offer an order of magnitude increase in developer productivity compared to using languages such as SQL, Java and .NET. This productivity allows developers to deliver more quickly, respond to changes faster or deliver more with fewer resources.

According to Gartner, the market for such tools is estimated to grow to $2..7 billion by 2013, and is currently dominated by a handful of enterprise class vendors. However, a new crop of Data Integration tools is emerging, with a mix of open source and commercial offerings each that seek to challenge the dominance of the established players.

This talk will discuss the history of this area of technology to help understand the conditions we see today, offer a view of the future of the market and describe how these tools can help drive value within today's business and academic communities. At the end of the talk, attendees will have an opportunity to use one of the commercial tools and make their own minds up about the value of such technology.

Phil Watt is Principal Consultant at one of the world’s largest Systems Integrators, and has been working with high volume enterprise data for more than 17 years, building and designing data warehouses for customers in telco, media, utilities and financial services sectors. During the last 10 years, Phil has worked with a number of Data Integration technologies and advised many businesses about choosing a DI tool and applying best practices in their deployment.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • eBay – 2 Petabytes and 6.5 PetabytesFacebook2.5 PetabytesWal-mart2.5 PetabytesYahoo> 10 Petabytes plannedLHC (Large Hadron Collider, Year 1)10 Petabytes data/yearNational ID Cards (planned estimate)>2 Terabytes
  • Many tools have claimed this in the past
  • 2 typesengine based, (Informatica, Ab Initio, expressor, etc)code generators (ETI, Talend, etc.)
  • DatabasesDifferent character sets (ASCII, EBCDIC, Unicode)International characters (unicode)Queues,Web Services (SOAP, WSDL, RPC)XMLODBC/JDBC
  • Features listed up to 2004 represent minimum marketable features for new entrants to the marketplace
  • Describe value of each Workflow optimisation is the key driver nowEarly tools focussed on selling developer features, strengths around complexity rather than value to delivery process.
  • Almost weekly news of M&A
  • Example of one analyst business’s view of the DI Tools marketplaceGartner’s Magic Quadrant provides a view of eligible vendors in the marketplace.Indicates this is a mature market, with considerable global interest and healthy competitionAlso notable that HP, for example, does not have a tool in this spaceThere may be vendors not in the Magic Quadrant that are worth considering – don’t rule out vendors based on inclusion/exclusion from this report
  • Goes much further than illustrated in this slideGovernance must apply structures to manage quality of dataEnterprises must incentivise people to maintain and improve data qualityyou cannot manage what you can’t measureMetrics must align to personal objectives
  • Unlocking value from data with data integration tools

    1. 1. Unlocking value from data with Data Integration Tools<br />Phil Watt, Principal Integration Architect, HP Business Intelligence Solutions, EMEA<br />29/04/2010<br />1<br />
    2. 2. Outline<br />Introduction<br />Business drivers – why use a DI tool?<br />the challenge<br />private sector<br />public sector<br />Background and history<br />DI tools timeline<br />Emerging features – and value<br />Governance and Best Practice<br />Selecting a tool for your situation<br />Demonstration:<br />Summary – followed by hands on session<br />29/04/2010<br />2<br />
    3. 3. About me<br />29/04/2010<br />3<br />19 years big data<br />10 years Data Integration tools<br />High volume<br />Complex business rules<br />Governance and metadata management<br />Clients include<br />BSkyB<br />BT<br />Barclays/Barclaycard<br />Centrica <br />Experian<br />John Lewis Partnership<br />Microsoft<br />A major UK political party<br />Strong focus on pragmatic delivery<br />Best practices<br />Design patterns<br />Tool evaluation, selection and implementation<br />
    4. 4. Scope<br />29/04/2010<br />4<br />
    5. 5. Glossary<br />29/04/2010<br />5<br />
    6. 6. The challenge<br />29/04/2010<br />6<br />
    7. 7. Data warehouse example sizes<br />29/04/2010<br />7<br />
    8. 8. Public and academic examples<br />29/04/2010<br />8<br />Birmingham City Council<br />http://www.experian.co.uk/www/pages/about_us/our_clients/<br />http://www.qas.co.uk/company/press/new-experian-software-helps-public-sector-to-enhance-single-citizen-view-projects-503.htm<br />University of Toulouse – academic medical research<br />http://www.talend.com/open-source-provider/casestudy/CaseStudy_Academic_Medical_Research_EN.php<br />
    9. 9. Benefits of DI tools<br />29/04/2010<br />9<br />
    10. 10. Extract, Transform and Load<br />29/04/2010<br />10<br />e.g. CRM or <br />ERP system<br />Hub and spoke<br />Shared DW and ETL server<br />
    11. 11. Extract, Load and Transform<br />29/04/2010<br />11<br />e.g. CRM or <br />ERP system<br />Shared DW and ETL server<br />
    12. 12. ETL versus ELT<br />29/04/2010<br />12<br />
    13. 13. Multiple sources and targets<br />29/04/2010<br />13<br />
    14. 14. DI Tools Features Timeline1995 – 2005<br />29/04/2010<br />14<br />
    15. 15. DI Tools Features Timeline from 2006<br />29/04/2010<br />15<br />
    16. 16. Market features<br />29/04/2010<br />16<br />
    17. 17. Gartner Magic Quadrant<br />Taken from research document, ‘Magic Quadrant for Data Integration Tools’ <br />Authors: Ted Friedman, Mark A. Beyer, Eric Thoo<br />Full report available by registering at www.talend.com<br />29/04/2010<br />17<br />Image removed for web publication as agreed with Gartner<br />
    18. 18. Magic Quadrant Disclaimer<br />The Magic Quadrant is copyrighted November 25, 2009 by Gartner, Inc. and is reused with permission. <br />The Magic Quadrant is a graphical representation of a marketplace at and for a specific time period. <br />It depicts Gartner's analysis of how certain vendors measure against criteria for that marketplace, as defined by Gartner. <br />Gartner does not endorse any vendor, product or service depicted in the Magic Quadrant, and does not advise technology users to select only those vendors placed in the "Leaders" quadrant. <br />The Magic Quadrant is intended solely as a research tool, and is not meant to be a specific guide to action. <br />Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.<br />29/04/2010<br />18<br />
    19. 19. Best practices<br />29/04/2010<br />19<br />
    20. 20. Worst Practices<br />29/04/2010<br />20<br />
    21. 21. Gartner advice<br />29/04/2010<br />21<br /><ul><li>Allocate minimum 20% to data source analysis
    22. 22. Allocate 20 - 30% to mapping and transformation rules
    23. 23. Avoid custom-coding or desktop tools
    24. 24. Increase business user involvement to improve success</li></ul>Best Practices Mitigate Data Migration Risks and<br />Challenges – May 2009<br />
    25. 25. Governance and the data integration lifecycle<br />29/04/2010<br />22<br />
    26. 26. Best practices<br />29/04/2010<br />23<br /><ul><li>Do:
    27. 27. Spend 50% of project time doing discovery, analysis, design
    28. 28. Get business users involved early and often
    29. 29. Use tools to accelerate and compress timescales
    30. 30. Pay attention to governance and metadata
    31. 31. So you can:
    32. 32. De-risk the project
    33. 33. Reduce overall cost and timescales
    34. 34. Achieve best possible quality</li></li></ul><li>Selecting a tool for your situation<br />29/04/2010<br />24<br />
    35. 35. Qualification matrix (PW )<br />29/04/2010<br />25<br />
    36. 36. Demonstration<br />29/04/2010<br />26<br />
    37. 37. 29/04/2010<br />27<br />
    38. 38. 29/04/2010<br />28<br />
    39. 39. 29/04/2010<br />29<br />
    40. 40. 29/04/2010<br />30<br />
    41. 41. 29/04/2010<br />31<br />
    42. 42. 29/04/2010<br />32<br />
    43. 43. Demo metrics<br />29/04/2010<br />33<br />Performance<br />Hardware – dual core 2.0Ghz Intel Centrino, 2.5Gb Ram<br />Environment – WinXP, Oracle Express (DB) +DI tool (Expressor 2.0)<br />3 data sources<br />Customers 155 MB 1000K records<br />Today’s orders 112 MB 100K records<br />Yesterday's orders 0.3 MB 3K records<br />Total data volume 267 MB 1.1M records<br />Execution time 72 seconds<br />Throughput 3.7 MB/sec 41k/sec<br />
    44. 44. Demo features<br />29/04/2010<br />34<br />Developer Productivity<br />Graphical development <br />Semantic Rationalisation and Re-usable Business Rules<br />Demo represents a generic business scenario<br />XML, message queues (MSMQ) , database inputs/outputs, joins, aggregations and referential integrity management<br />Similar features to the ATG/Integrated Basket challenges?<br />
    45. 45. Summary<br />29/04/2010<br />35<br />Business drivers – why use a DI tool?<br />the challenge<br />private sector<br />public sector<br />Background and history<br />DI tools timeline<br />Emerging features – and value<br />Governance and Best Practice<br />Selecting a tool for your situation<br />Demonstration:<br />
    46. 46. Questions<br />29/04/2010<br />36<br />
    47. 47. References <br />29/04/2010<br />37<br />Curt Monashhttp://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/<br />Wired: http://www.wired.com/wired/archive/12.04/grid.html<br />Zdnet: http://blogs.zdnet.com/storage/?p=213<br />Professor Chris Bishop: http://conferences.theiet.org/lectures/turing/<br />Gartner http://www.gartner.com<br />LHC data (2007): http://www-conf.slac.stanford.edu/xldb07/xldb_lhc.pdf<br />