Today I’d like cover several topics. I will start with an overview of the barriers that companies face in achieving good data quality. I will then talk about a major new release for our data quality product – Informatica 9. Afterwards, we will hear a customer’s perspective about Informatica 9 data quality and then finally, we’ll conclude with a product demo.
The fundamental challenge of getting good data, trustworthy data, is a tremendous problem that companies face today. Every business understands and recognizes the importance of having good data. So why is the quality of data so poor? Why is it so difficult to get data to the point that it is trustworthy? Very simply, the issue comes down to one of complexity. IT data infrastructure is extremely complex and diverse. Here on the slide, you see a single application system. That application contains within it multiple types of data – customer data, product data, order data and TRADITIONALLY... Different types of data quality technologies have been applied to fix data quality problems. TRADITIONALLY, Customer data quality technology is different than product data quality technology, etc.
Now let’s look at how the problem gets harder. Not only are there different data quality technologies that apply to different data types within a single application, but a typical business will have many different application systems. TRADITIONALLY, a different data quality technology is sold for each different type of application system. You may have a specific address cleansing technology applied to Salesforce data, you may have purchased data quality from a certified Siebel partner, you may have data quality technology that comes packaged with SAP. The problem grows even more complex when you consider multiple geographies. For example, you may have one address technology for the UK, and a completely different one for the US. You may have matching technology that works for english, but a different matching technology for Chinese.
And the problem with fixing bad data gets even harder because there are different requirements for the level of data quality depending on the type of business purpose. And to make matters even worse, across the entire organization there are only a few individuals (most likely in IT) who are tasked with using data quality technology to address this problem of bad data. How can they do this? How can IT individuals anticipate the business needs across the organization and understand the entire breadth of all the data in all the applications in every geography within an enterprise. This is why data governance is a hot topic among customers today. Yet even with processes and organizational structure applied to governance this problem presents an enormous challenge, a seemingly overwhelming challenge.
But A challenge that Informatica 9 was built specifically to help customers overcome. With Informatica 9, business stakeholders are empowered to actively participate in identifying critical data quality issues and specifying the rules that enforce high standards for data quality. This allows the organization to make the right levels of data quality investment for each business initiative. With Informatica 9, the same technology applied towards customer data quality can be applied towards product data quality. The same technology applied towards Oracle data can be applied towards SAP data, or Salesforce data. This is what we believe is needed in order to finally apply data quality technology towards productive business ends. Data Quality must be applied pervasively. Data Quality must be pervasive.
Pervasive data quality means allowing all stakeholders both in business and in IT to participate in the data quality process. Pervasive data quality means data quality that applies for all data and for all business purposes and finally Pervasive data quality means being open to all applications This is an overview of what we are offering with Informatica 9. And what I’d like to do is to spend time going into more specific product capabilities that underwrite each of these three claims.
Vendors in our industry have been claiming for years that their tools are business friendly and that data quality tools are being used across an enterprise community. Recent studies suggest otherwise. In an August 2009 survey conducted by Gartner, companies that had purchased data quality tools on average deployed to them to only a handful of users. Why is this? The impact of poor data is enormous and as we saw earlier there are so many different areas where data quality can be applied. So, then why are only a few people involved in fixing these problems? Why are only a handful of individuals using data quality tools? Well, the research report suggests that a significant hindrance to adoption is that the tools are not appropriate for business users. Specifically, the quote from Gartner reads “ the tools in the market are technically complex and oriented toward IT roles and skills, rather than business roles and skills” In other words, if the data quality industry can actually deliver true business oriented tools (not just simplified or repurposed versions of their same technical tools), then business stakeholders can finally be able to actively participate.
This is why we have received such positive feedback on Informatica 9. With Informatica data quality, we offer business users a 100% browser-based interface. They can create, modify and monitor data quality scorecards that provide a business view on the health of their data. They can correlate the health of their business with the health of their data. They can understand the relationship between the health of their data and the efficiency of their business processes. And if actions need to be taken, they can easily right click and email a URL to a coworker to show them the specific data issues that are impacting their business. So now ANYONE that can use a web browser or send emails can participate in the data quality process. That’s how easy we’ve made it with Informatica 9.
The new release also offers data analysts and data stewards a UI that’s 100% browser based. Here you can see a screenshot of our profiling analysis and quality rule specification screen. This empowers business users to directly engage at every phase of the data quality process. From data review, analysis, reference data specification rule specification, to scorecard creation and monitoring. This UI has been designed from the ground up to cater to business users. We have invested specifically to provide exceptional user experience in terms of ease of use and functional breadth with our new Informatica 9 offering.
This investment applies not only to the business users among our customers, but of course the IT developers. Something I hear very often from companies interested in making a data quality investment is that they are concerned they do not have enough trained staff. With Informatica 9, we allow customers to leverage their existing staff of data integration developers to be immediately productive on data quality projects. The first productivity gain for IT was actually mentioned in the 2 previous slides. Getting the business to actively participate and specify data quality rules…. That is huge for IT. This frees up IT to focus time on infrastructure tasks, instead of sending emails back and forth meticulously working validate requirements. For the IT developer in Informatica 9, we have a dedicated developer UI with a full pallet of data quality transformations. In addition, we can showcase the Informatica 9 services based architecture. The same profiling service that is available in the business interface is also available to the developer. Now, its often the case that a developer profiles data in order to better understand how to develop transformation mappings. For example, they want to know the ranges of values to account for, or the group structure for aggregator transformations. Informatica 9 is the only product to offer mid-stream profiling. This means you can profile data at any point in the transformation pipeline. There is no other product on the market that offers this level of debugging – this provides developers with tremendous productivity savings. Tasks that may have taken hours are now doable in minutes.
I mentioned earlier that a key issue for data quality technology is the functional breadth of the offering. What you see on the slide is the research indicating that customers need data quality for multiple types of data. 60% of survey respondents apply data quality to customer data while approximately 20% of respondents indicated a need for data quality applied to other data domains such as product or location. It is very important to understand these survey results within the context of customer adoption practices. Customers often start a data quality initiative with a quick win in an area involving one specific data type, for example address cleansing. Then, they look to build on their project success by expanding the scope of their data quality initiative to include other types of data. This is exactly what happened at one of our customers Avaya where they received a quick $2M ROI just from address cleansing, but felt strongly about expanding the scope of their data quality investment to yield greater gains.
They need to be reused when synchronizing data with new new app, when entering information in real-time into that new app, and when exchanging new partner data in batch with the application.
Something else that enables more reuse is broad geographic coverage. In the area of address validation, Informatica delivers a single engine that covers more than 240 countries and territories – the broadest coverage on the market. As you may be aware, this year Informatica acquired AddressDoctor, a premier address cleansing technology and now Informatica 9 is the first to offer the very latest AddressDoctor capabilities such as enhanced performance – as high as 6X faster processing in some geographies – and integrated geocoding. Informatica 9 will be the first to adopt the latest major release of AddressDoctor – so Informatica’s customers will be the first to benefit from the highest performing technology coming out of AddressDoctor, and all the major new capabilities such as integrated geocoding. We expect that trend to continue as we coordinate our development efforts and ensure that we take advantage of the best that AddressDoctor has to offer.
Now, the notion of broader geographic coverage not only applies to address cleansing, but also to identity matching. Informatica provides highly accurate pre-built matching rules that have been tuned for over 50 different countries. So what that means is - Not only can Informatica match customer data in english, but in chinese, arabic, and cyrillic and many other language scripts. The slide illustrates a capability completely unique in the market. Informatica data quality is the only offering that can match identity data across languages. In other words, if you have duplicate data about the same person or company represented in multiple languages – you can still match that data. That’s just one indication of the level of sophistication of the matching technology available from Informatica.
So we’ve talked about the need to involve all stakeholders, we’ve talked about the need to comprehensively support all data and all purposes. The third element of pervasive data quality is to be open to all applications. Why is this important? This is a gartner slide that talks about the various threats to an applications level of data quality. Thanks to the rapid uptake of various data integration technologies, there are many ways that data can make its way into an application and hence there are many ways that bad data can make its way into an application. The ramification then is that cleasing data within a single application is no longer good enough. What good is it to clean up all the data within a specifica application only to have bad data flood into from other applications. Data quality must be applied broadly – pervasively, if you will – across all applications in your enterprise.
Earlier we talked about how traditional data quality was applied sparingly to but a few types of databases because connectivty is not a core competency for data quality. Connectivity, however, is a core competency for data integration. At Informatica, we can offer the best of both worlds. Because Informatica data quality is built on top of the Informatica platform, we can leverage Informatica’s breadth of connectivity. Now, I’ve heard other claims of universal connectivity simply based upon support for ODBC. At Informatica, our connectivity support goes well beyond relational databases, into ERP, and well beyond the enterprise and into the cloud and partner data exchanges.
Now depending on the application, data quality may need to be applied with different latencies and protocols. Again, because our data quality is built on top of proven data integration technology, we offer data quality applied in real-time as well as during bulk loading. The rules are the the same. The data quality rules are specified and refined apart from whether they get applied in real-time or whether they get applied in bulk loading. In fact, with Informatica 9’s data services capabilities, these data quality rules can be applied in multiple protocols – via web services calls or SQL calls. Imagine that – applying data quality on data that is requested via SQL. Informatica 9 is the first and only offering that can do this. No where else can you get data quality that can be applied is this fashion.
By offering broad connectivity, covering a wide range of latencies and protocols – what this allows is centralized data quality rules management. Most applications come with some sort of data quality rules management – they may not call it that – but they are implementing some form of data quality rule. For example, you might have field validation triggers that perform customer name standardization in your CRM application. You may have some matching rules in your MDM application, and so on. These application-specific rules may work on one form of data entry, but not others and therefore the application is still prone to bad data. For example, that field validation trigger may prevent inconsistent values from being entered into your customer application. But what happens when you bulk load a list of customer supplied by a partner. The data (in particular the BAD data) now completely by-passes that data quality field validation trigger. This example illustrates why many companies are looking to abstract certain rules outside the application. With Informatica 9 , you can do this, you can abstract the data quality rules out of the application and define and manage data quality services that only better protect that specific application, but also benefit many other applications.
Informatica 9 pervasive dq
Informatica 9 La Qualité Globale des Données à travers toute l’Entreprise
Agenda <ul><li>Les freins à la qualité de données </li></ul><ul><li>Informatica 9 (Pervasive Data Quality) </li></ul>
Comment pouvons-nous avoir confiance en nos données ? CLIENT PRODUIT COMMANDE
Comment pouvons-nous avoir confiance en nos données ? CLIENT OP CONTACT CLIENT PREVISION CMD CLIENT PRODUIT CMD CLIENT PRODUIT CMD CLIENT FACTURE CONTRAT
La Qualité de données Jamais assez bonne…pour ceux qui la consomme Faciliter la prise de décision Moderniser Le métier Améliorer l’efficacité & Réduire les coûts Améliorer la qualité de service Externaliser les Fonctions métier Augmenter l’efficacité du réseau partenaire Fusions Acquisitions Impératifs Métiers CLIENT OP CONTACT CLIENT PREVISION CMD CLIENT PRODUIT CMD CLIENT PRODUIT CMD CLIENT FACTURE CONTRAT
Pervasive Data Quality Informatica 9 Impératifs Métiers CLIENT COMMANDE PRODUIT FACTURE Règles de qualité de données centralisées CLIENT OP CONTACT CLIENT PREVISION CMD CLIENT PRODUIT CMD CLIENT PRODUIT CMD CLIENT FACTURE CONTRAT Faciliter la prise de décision Moderniser le métier Améliorer l’efficacité & Réduire Les coûts Améliorer la qualité de service Externaliser les Fonctions métier Augmenter l’efficacité du réseau partenaire Fusions et Acquisitions
<ul><ul><li>Services de qualité de données unifiées et disponibles pour tous </li></ul></ul><ul><ul><li>Support complet pour tous les types de données et tous les sujets </li></ul></ul><ul><ul><li>Ouvert à toutes les applications </li></ul></ul>Pervasive Data Quality Informatica 9
Pervasive Data Quality Le besoin d’impliquer tous les intervenants La plupart des sociétés ont 2 à 5 personnes qui interagissent avec des outils de qualité de données 2 à 5 55% 1 3% Plus de 20 8% 11 à 20 12% 6 à 10 22% Nombre d’individus utilisant outils de Qualité de données “ … les outils du marché sont techniquement complexes et destinés aux informaticiens, au lieu des utilisateurs métier” Gartner Survey Août. 2009
Des fonctionnalités adaptées par rôle/fonction <ul><li>100% accessible depuis un simple navigateur web </li></ul><ul><li>Analyse des données par simples “Drill Down” </li></ul><ul><li>Tableau de bord en mode partagé </li></ul>Collaborer autour d’un scorecard (client-léger) La Qualité de données “pro-active”
Des fonctionnalités adaptées par rôle/fonction Utilisation simple & efficace pour les Data Stewards & Analystes <ul><li>100% accessible depuis un simple navigateur web </li></ul><ul><li>Standardiser les données de références </li></ul><ul><li>Création de règles réutilisables </li></ul>Reduire le recours systématique à l’informatique
<ul><li>Règles de qualité de données & transformations pré-définies </li></ul><ul><li>Service de profiling applicable sur n’importe quel composant (actif/passif) </li></ul><ul><li>Règles réutilisables </li></ul>Des fonctionnalités adaptées par rôle/fonction Amélioration significative des développements (Dev & Recette) Profiler les données à n’importe quelle étape d’un traitement
Pervasive Data Quality L e besoin de devoir supporter plusieurs types de données Pourcentage des répondants Client/Entité Localisation Produit/Matériel Finance “ Nous avons calculé grossièrement 2M$ de ROI rien que sur la standardisation des adresses. Et pourtant ça n’est pas sur cette partie que le véritable ROI se mesure…”
La qualité de données universelle Data Warehouse Data Migration Test Data Management & Archiving Master Data Management Data Synchronization B2B Data Exchange Data Consolidation IT Projects Data Quality Banque Localisation Capitaux Client Finance Produit <ul><li>Nettoyage et rapprochement pour tous les domaines </li></ul><ul><li>Tous les types de projet (cf, Migration de données, MDM) </li></ul><ul><li>Réutilisation des composants(cf., définitions des sources, règles DQ, Profiling etc.) </li></ul>Diminution des coûts, Amélioration de la standardisation, + grande réutilisation
La qualité de données universelle <ul><li>240 Pays, 1 seul moteur </li></ul><ul><li>Haute Performance </li></ul><ul><li>Integration du Geocoding </li></ul>Validation des adresses Couverture géographique mondiale Données Clients & Adresses fiables, enrichies au niveau mondial
La qualité de données universelle Services de Matching ultra performants Données internationnales ABDULLAH AL MUSA A.ALLAH ALMOUSA عبدالاله الموس PEG MC CARY MARGARET MACCLARY GRIETJE MCCLLARY ΠΑΠΑΔΟΠΟΥΛΟΣ GΕΩΡG ΓΕΩΡΓ ΠΑΠΑΔΟΠΟΥΛΟΣ GEORGE PAPADOPLOUS TARO YAMADA 山田太郎 ﾔﾏﾀﾞ ﾀﾛｳ Fournit des services de résolution d’identités complexes fiables & efficaces quelque soient la langue et le pays – Base de connaissance mondiale embarquée
Pervasive Data Quality L e besoin d’être ouvert à toutes les applications Etablir un « pare-feu » pour protéger vos applications et processus métier Controles à la frontière du métier Les données entrantes analysées Les données incomplètes isolées Corriger les données aux endroits souhaités Les responsables sont identifiés et alertés Contrôles de mise à jour orientés processus
Ouvert à toutes les applications Connectivité universelle Informatica Data Quality Application Unstructured Database Cloud Computing Partner Data SWIFT NACHA HIPAA …
Ouvert à toutes les applications Règles de qualité de données centralisées Au Point d’entrée Ou en Chargement massif Règles DQ centralisées Client Commande Produit Facture Règles Règles Règles Règles
<ul><li>Une fois centralisées, les règles de qualité de données deviennent un standard au niveau de l’entreprise </li></ul>Règles DQ centralisées Client Commande Produit Facture Règles Règles Règles Règles Customer Service Portal Sales Automation Application BI Application Ouvert à toutes les applications Règles de qualité de données centralisées