The problem is that today it takes too long to deliver new critical data or reports to the business…You can see this from the results of the 2011 TDWI BI Benchmark Report, on an average, it takes months to add a new source of data to a data warehouse.
Let’s see why it takes so long.As a first step, it is important to take into account a typical data integration process, which is by nature, multi-step and involves the business when it is too late. At this point, if the business wants changes or needs other data or identifies inaccuracies, getting IT’s help means going back into a queue and waiting for IT as they work through their backlog of requests.The reasons for this delay in delivering new data and reports are manifoldIt takes too long for the Business to explain requirements to ITIt takes months for IT to change a DW / add new critical dataIt takes many iterations between Business and IT to get the right data / reportsAny changes in the underlying data sources break integrations and impact consuming applicationsDirectly accessing operational systems is not possible / ideal
Finally, Informatica enables business and IT to deliver a current, complete, and trusted view of the business – within days vs. months. It does this by:Creating a common logical data access layer across all data sources - a point to remember here is that if it is not possible or desirable to directly hit an operational system, data replication can be used to create a replica and then use that replica as a source – this step can be done by the Analyst, without waiting for IT’s helpAccessing and merging diverse data into a virtual view without physically moving the data – this step can also be done by the Analyst, without waiting for IT’s helpInvolving the Analyst to analyze and profile the federated data or the virtual view – which means no staging or no further processingApplying advanced transformations including data quality in real-time to the federated data or virtual view And then, delivering data services or virtual views that can be instantly reused across projectsAll these capabilities are available as a single package called PowerCenter Data Virtualization Edition. Enterprises can reuse existing Informatica skills and data integration logic to deliver BI projects up to five times faster and at a third of the cost.
The ability to switch seamlessly and transparently between delivery modes (bulk/batch vs. granular real-time vs. federation) with minimalrework will be key for IT organizations seeking to develop a successful data integration strategy.Ted Friedman, VP Distinguished Analyst, Gartner
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Architect, Informatica
1Informatica’s Data Virtualization SolutionSanjay KrishnamurthiMay 28, 2013
2Problem Statement : Takes too long toget business the data it needsOn average, how long does it take to add a newsource of data to your data warehouse?On average, how long does it take to create a complexreport or dashboard with about 20 dimensions, 12measures, and 6 user access roles?2011TDWI BI BENCHMARK REPORTOrganizational and Performance Metricsfor Business Intelligence Teams1 week2 weeks3 weeks1 month2 months3 months4-6 months6 months or more11%7%7%22%20%14%12%7%On average, how long does it take to change ahierarchy (e.g. a new way of classifying productsor organizing sales regions)?1 week2 weeks3 weeks1 month2 months3 months4-6 months6 months or more25%13%8%25%10%6%8%4%1 week2 weeks3 weeks1 month2 months3 months4-6 months6 months or more15%9%13%16%16%16%9%5%
3Why Does it Take So Long?It takes too long to explainrequirementsIt takes months to change aDW/add new critical dataIt takes many iterations to getthe right data/reportsChanges break integrations &impact applicationsDirectly accessing operationalsystems is not possible / idealTypical Data Integration ProcessIT Hasa Huge Backlogabcd123456DesignChangeIntegrateUnit TestValidateDeployBusiness isInvolved Too LateAs-Is Value Stream Map (LOT OF WAIT & WASTE)e
4What is Needed?PROFILE AND CLEASE DATA SOIT CAN BE READILY TRUSTEDDELIVER REUSABLE DATASERVICES TO CONSUMERSCREATE A COMMON ACCESSLAYER ACROSS DATA SOURCESEnterpriseData SourcesDataVirtualization(Built-On LeanPrinciples)…PRODUCTCUSTOMER ORDERLogical View of All Underlying DataQUICKLY & DIRECTLY ACCESSDATA WITHOUT MOVEMENT00110101001001010101101010010110PortalBI Composite AppsDataConsumers
5Informatica Proprietary/Confidential. Informational Purposes Only. No Representation, Warranty orCommitment regarding Future Functionality. Not to be Relied Upon in Making Purchasing Decision.Business ITTRANSFORM IN RTAdvanced Transformations,Data Quality, Data Masking4Virtual TableReplicatedCRMAccountsACCESS & MERGE2Virtual TablePROFILE IN RTBusinessManagerAnalyst,StewardDeveloper,ArchitectCommonMetadata3Virtual TableMODELCustomerNameAddressCategoryOrders1Virtual TableCRMSCALE & PERFORMAccounts7Optimizations& CachingVirtual TableMOVE OR FEDERATEAccountsCall CenterDW6Virtual TableREUSE INSTANTLYBatch Web Services5QueryEngineWSServerVirtual TableCRMAgile Data Platform
6Data Virtualization :Piece of Agile Data Platform Puzzle• Provides a semantic access layer atop variety of data sources• Data needs to be clean, masked etc.• Pre-built library of advanced data transformations, e.g. merge• Integrated real-time, on-the-fly data profiling & data qualityDWBIVirtual ViewAccessMergeDeliverDWPrototypeFirstMove to DWor Instantly Reuseas SQL/WSAdvancedTransformations &Data QualityAnalyze & ProfileData & LogicAnytimeEarly BusinessInvolvement
7Informatica Proprietary/Confidential. Informational Purposes Only. No Representation, Warranty orCommitment regarding Future Functionality. Not to be Relied Upon in Making Purchasing Decision.Key Considerations1000s oflines of codeTIME COSTMaintenanceNightmareModel & metadata-driven environmentTIME COSTSustain &MaintainEnabling RapidDevelopmentv/sProfile data ANDlogic anywhereTIME COST RISKGet it Right1st TimeOnly source profiling,need extra processingMany Iterations& MistakesTIME COST RISKAnalyzing &Profilingv/s Hand-coding can’t doadvanced transformsTIME COST RISKSQLXQuerySimple CleansingWeb ServiceLimited Rules,No Data QualityLeverage pre-builtlogic including qualityTIME COST RISKVirtual TableBake-inQualityIntegratingwith Qualityv/sNaturally extendyour infrastructureTIME COSTRe-purposeLogic & SkillsTIME COSTRe-work, re-deploy &re-train every timeRe-invent theWheelLeveragingInvestmentsv/sScaling withFlexibilityv/sVirtualize or physicallymaterialize in 1 toolTIME COSTPrototype First& Then ScaleEIIOptimizationsTIME COSTOverburden DataVirtualizationEIIXRISKNon-integratedtechnologies
8Gartner Magic Quadrant forData Integration Tools, 2011“The ability to switch seamlessly and transparentlybetween delivery modes (bulk / batch vs. granularreal-time vs. federation) with minimal rework will bekey for IT organizations seeking to develop asuccessful data integration strategy.”Ted Friedman, VP Distinguished Analyst, GartnerLeveraging the Power of the Platform“With v9, Informatica advanced its capabilities withon-the-fly data quality and profiling, a model-drivenapproach to provisioning data services, performanceenhancements, cloud integration, common metadata,and role-specific tools.”The Forrester Wave: Data Virtualization, Q1 2012Forrester Wave: DataVirtualization, Q1 ‘12Power ofThe Platform