Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unlocking value from data with data integration tools

871 views

Published on

Every day, consumers, businesses and not for profit organisations generate increasing volumes of data. Initiatives such as Smart Meters in the utilities sector, along with user generated 'Web 2.0' data sources and High Energy Physics are causing an exponential growth in available data. Many business seek to take advantage of this data to analyse business performance or understand trends in customer or prospect behaviour.

This analytical data often requires looking at very high volume, complex data sources. To bring this together in a format that is easy for analysts to understand and query is often very challenging - particularly for businesses when business requirements for this data change and a rapid response can mean the difference between profit and loss.

This is just one of many areas that Data Integration tools and technologies are being applied - providing the 'plumbing' from a source system to a target system. DI tools are designed to offer an order of magnitude increase in developer productivity compared to using languages such as SQL, Java and .NET. This productivity allows developers to deliver more quickly, respond to changes faster or deliver more with fewer resources.

According to Gartner, the market for such tools is estimated to grow to $2..7 billion by 2013, and is currently dominated by a handful of enterprise class vendors. However, a new crop of Data Integration tools is emerging, with a mix of open source and commercial offerings each that seek to challenge the dominance of the established players.

This talk will discuss the history of this area of technology to help understand the conditions we see today, offer a view of the future of the market and describe how these tools can help drive value within today's business and academic communities. At the end of the talk, attendees will have an opportunity to use one of the commercial tools and make their own minds up about the value of such technology.

Phil Watt is Principal Consultant at one of the world’s largest Systems Integrators, and has been working with high volume enterprise data for more than 17 years, building and designing data warehouses for customers in telco, media, utilities and financial services sectors. During the last 10 years, Phil has worked with a number of Data Integration technologies and advised many businesses about choosing a DI tool and applying best practices in their deployment.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Unlocking value from data with data integration tools

  1. 1. Unlocking value from data with Data Integration Tools<br />Phil Watt, Principal Integration Architect, HP Business Intelligence Solutions, EMEA<br />29/04/2010<br />1<br />
  2. 2. Outline<br />Introduction<br />Business drivers – why use a DI tool?<br />the challenge<br />private sector<br />public sector<br />Background and history<br />DI tools timeline<br />Emerging features – and value<br />Governance and Best Practice<br />Selecting a tool for your situation<br />Demonstration:<br />Summary – followed by hands on session<br />29/04/2010<br />2<br />
  3. 3. About me<br />29/04/2010<br />3<br />19 years big data<br />10 years Data Integration tools<br />High volume<br />Complex business rules<br />Governance and metadata management<br />Clients include<br />BSkyB<br />BT<br />Barclays/Barclaycard<br />Centrica <br />Experian<br />John Lewis Partnership<br />Microsoft<br />A major UK political party<br />Strong focus on pragmatic delivery<br />Best practices<br />Design patterns<br />Tool evaluation, selection and implementation<br />
  4. 4. Scope<br />29/04/2010<br />4<br />
  5. 5. Glossary<br />29/04/2010<br />5<br />
  6. 6. The challenge<br />29/04/2010<br />6<br />
  7. 7. Data warehouse example sizes<br />29/04/2010<br />7<br />
  8. 8. Public and academic examples<br />29/04/2010<br />8<br />Birmingham City Council<br />http://www.experian.co.uk/www/pages/about_us/our_clients/<br />http://www.qas.co.uk/company/press/new-experian-software-helps-public-sector-to-enhance-single-citizen-view-projects-503.htm<br />University of Toulouse – academic medical research<br />http://www.talend.com/open-source-provider/casestudy/CaseStudy_Academic_Medical_Research_EN.php<br />
  9. 9. Benefits of DI tools<br />29/04/2010<br />9<br />
  10. 10. Extract, Transform and Load<br />29/04/2010<br />10<br />e.g. CRM or <br />ERP system<br />Hub and spoke<br />Shared DW and ETL server<br />
  11. 11. Extract, Load and Transform<br />29/04/2010<br />11<br />e.g. CRM or <br />ERP system<br />Shared DW and ETL server<br />
  12. 12. ETL versus ELT<br />29/04/2010<br />12<br />
  13. 13. Multiple sources and targets<br />29/04/2010<br />13<br />
  14. 14. DI Tools Features Timeline1995 – 2005<br />29/04/2010<br />14<br />
  15. 15. DI Tools Features Timeline from 2006<br />29/04/2010<br />15<br />
  16. 16. Market features<br />29/04/2010<br />16<br />
  17. 17. Gartner Magic Quadrant<br />Taken from research document, ‘Magic Quadrant for Data Integration Tools’ <br />Authors: Ted Friedman, Mark A. Beyer, Eric Thoo<br />Full report available by registering at www.talend.com<br />29/04/2010<br />17<br />Image removed for web publication as agreed with Gartner<br />
  18. 18. Magic Quadrant Disclaimer<br />The Magic Quadrant is copyrighted November 25, 2009 by Gartner, Inc. and is reused with permission. <br />The Magic Quadrant is a graphical representation of a marketplace at and for a specific time period. <br />It depicts Gartner's analysis of how certain vendors measure against criteria for that marketplace, as defined by Gartner. <br />Gartner does not endorse any vendor, product or service depicted in the Magic Quadrant, and does not advise technology users to select only those vendors placed in the "Leaders" quadrant. <br />The Magic Quadrant is intended solely as a research tool, and is not meant to be a specific guide to action. <br />Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.<br />29/04/2010<br />18<br />
  19. 19. Best practices<br />29/04/2010<br />19<br />
  20. 20. Worst Practices<br />29/04/2010<br />20<br />
  21. 21. Gartner advice<br />29/04/2010<br />21<br /><ul><li>Allocate minimum 20% to data source analysis
  22. 22. Allocate 20 - 30% to mapping and transformation rules
  23. 23. Avoid custom-coding or desktop tools
  24. 24. Increase business user involvement to improve success</li></ul>Best Practices Mitigate Data Migration Risks and<br />Challenges – May 2009<br />
  25. 25. Governance and the data integration lifecycle<br />29/04/2010<br />22<br />
  26. 26. Best practices<br />29/04/2010<br />23<br /><ul><li>Do:
  27. 27. Spend 50% of project time doing discovery, analysis, design
  28. 28. Get business users involved early and often
  29. 29. Use tools to accelerate and compress timescales
  30. 30. Pay attention to governance and metadata
  31. 31. So you can:
  32. 32. De-risk the project
  33. 33. Reduce overall cost and timescales
  34. 34. Achieve best possible quality</li></li></ul><li>Selecting a tool for your situation<br />29/04/2010<br />24<br />
  35. 35. Qualification matrix (PW )<br />29/04/2010<br />25<br />
  36. 36. Demonstration<br />29/04/2010<br />26<br />
  37. 37. 29/04/2010<br />27<br />
  38. 38. 29/04/2010<br />28<br />
  39. 39. 29/04/2010<br />29<br />
  40. 40. 29/04/2010<br />30<br />
  41. 41. 29/04/2010<br />31<br />
  42. 42. 29/04/2010<br />32<br />
  43. 43. Demo metrics<br />29/04/2010<br />33<br />Performance<br />Hardware – dual core 2.0Ghz Intel Centrino, 2.5Gb Ram<br />Environment – WinXP, Oracle Express (DB) +DI tool (Expressor 2.0)<br />3 data sources<br />Customers 155 MB 1000K records<br />Today’s orders 112 MB 100K records<br />Yesterday's orders 0.3 MB 3K records<br />Total data volume 267 MB 1.1M records<br />Execution time 72 seconds<br />Throughput 3.7 MB/sec 41k/sec<br />
  44. 44. Demo features<br />29/04/2010<br />34<br />Developer Productivity<br />Graphical development <br />Semantic Rationalisation and Re-usable Business Rules<br />Demo represents a generic business scenario<br />XML, message queues (MSMQ) , database inputs/outputs, joins, aggregations and referential integrity management<br />Similar features to the ATG/Integrated Basket challenges?<br />
  45. 45. Summary<br />29/04/2010<br />35<br />Business drivers – why use a DI tool?<br />the challenge<br />private sector<br />public sector<br />Background and history<br />DI tools timeline<br />Emerging features – and value<br />Governance and Best Practice<br />Selecting a tool for your situation<br />Demonstration:<br />
  46. 46. Questions<br />29/04/2010<br />36<br />
  47. 47. References <br />29/04/2010<br />37<br />Curt Monashhttp://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/<br />Wired: http://www.wired.com/wired/archive/12.04/grid.html<br />Zdnet: http://blogs.zdnet.com/storage/?p=213<br />Professor Chris Bishop: http://conferences.theiet.org/lectures/turing/<br />Gartner http://www.gartner.com<br />LHC data (2007): http://www-conf.slac.stanford.edu/xldb07/xldb_lhc.pdf<br />

×