An Eye on the Future A Review of Data Virtualization Techniques to Improve Research Analytics RICHTER

947 views
820 views

Published on

Clinical Informatics

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
947
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
33
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Change History v7.1 - added links to slide 14 (vendor info) and corrected spelling errors - added comments to slides 2 and 13
  • Quote From the "BeyeNETWORK : Comprehensive resources for business intelligence and data warehousing professionals" Web Link : http://www.b-eye-network.com/view/14815 =========================================================================================== Rick van der Lans Rick F. van der Lans is an independent consultant, author and lecturer specializing in XML, data warehousing, application integration, and information modelling. He is Managing Director of R20/Consultancy based in The Netherlands. Rick has advised many large companies worldwide on defining their data warehouse architectures. He is chairman of the Database Systems Show (organized annually in The Netherlands since 1984), and he is columnist for two major newspapers in the Benelux, "Computable" and "DataNews". Additionally, he is advisor for magazines such as Software Release Magazine and Database Magazine. ============================================================================================ Leading DV Vendor reports A top 5 global bank used DV to create SOA data services layer across 200+ sources and 20+ applications. 250% ROI in 3 months elapsed time, 2% revenue increase within the business unit, 50-60% reduction in integration design and development time for new applications and portals, 25% increase in object reuse for downstream BI reporting projects. A top 3 global pharmaceutical firm used DV to quickly prototype, develop and deploy the new information solutions required to support strategic business decision 90% reduction in time to create a new report, 10X faster time-to-business-value for new views and applications, 200% ROI in 3 months elapsed time, 100% increase in key business analyst productivity, and 5% improvement in R&D project on-time delivery. Cable and Home Internet Service Provider – used DV to federate data from different source - data, e-mail data, credential data, etc into one virtual layer Reduced transaction time for processing a request from average of 5 seconds down to 1 second, significantly reduced manpower to manage the systems by 20% (development costs reduced), reduced time-to-market for deploying new applications by 25%. ** Notes & Slide Source: PPT Presentation - Data Virtualization .. Data Execution Strategy & Architecture Presentation 0/27/2011 by: Wayne Little & Raj Devkaran (KPIT)
  • Some Key features of Data Virtualization Multiple data delivery methods to the consuming applications SOA data services ODBC/JDBC REST Embedded Metadata in the Virtual Platform Availability of re-usable Data Quality and Data Transform definitions Scheduling & pre-processing Data Caching Data Discovery Rapid Prototyping DV enables integration of structured (databases), semi-structured (spreadsheets) and non-structured (weblogs, pdf file) data. Data can be cached for purposes of pre-fetching static or large data. Cache jobs are usually scheduled while DV jobs are usually realtime. Mobile devices can also "connect" to the virtual schema. Data Warehouse extension and virtual data marts are some other uses.
  • Concerns Timing Costs (Hardware, Programming, Support) Constant redesign of database and data movement (ETL) processes Data Quality issues introduced within the process Benefits Single Source of data – Consistent view of the data for all users
  • Concerns Same issues as the EDW Data duplicated between Data Marts Inconsistent ETL processes/Business rules may be applied If sourced from EDW may not be possible to trace back to true data source Benefits Allows for consistent interpretation of the data sets involved ETLs can be subject area specific Smaller datasets improve performance
  • Concerns Usually requires all databases be from the same vendor May Require users to know the relationships of data between different databases Performance issues (network, source system, end user) Benefits Allows access to data across databases without physical movement of the data No ETLs and Minimal IT involvement
  • Concerns Requires all data to be in the same database unless Distributed Database Links or something similar is used. Usually requires knowledge of the data, data relationships and rules May introduce performance issues (can be “materialized” to resolve this issue) May require movement of the view dataset Benefits Insulates the user from physical database changes. Allows for centralized consistent application of business rules Allows for centralized consistent creation derived data elements Makes data queries easier to write - less joins Fixes need to be applied only once – in the view Simple Example create view pat_names as SELECT pat_mrn, pat_name FROM base.patient WHERE patient_type <> ‘TEST’; Complex Example: Create view VDW_DEATH as SELECT smrn.PERSON_ID as MRN, rdd.DEATHDT as DeathDt, rdd.DTIMPUTE as DtImpute, rdd.COD_DX as UnderCOD, rdd.CODETYPE as Codetype, rdd.SOURCE as Source, rdd.SOURCE_CONFIDENCE as Confidence, rdd.DEATH_ID as Local_Death_ID FROM RSRCH_DEATH_DETAILS as rdd INNER JOIN RSRCH_ID_PAT_PERSON as smrn ON smrn.VDW_EXCLUDED_FLAG='N‘ and rdd.PAT_ID = smrn.PAT_ID AND rdd.LINE=1;
  • Problems Addressed by Data Virtualization Data Challenges Rapid data proliferation Increasing complexity & volume Data needs beyond structured Data duplication & inconsistency Silo’d data approaches Dollars & Delivery Speed Shrinking budgets Need for personalized real-time Health Care data Classic delivery methods too slow Increasing regulations Lack of agility to timely meet business needs Value Proposition for Data Virtualization Faster, More Agile Delivery Quicker time to market – virtual rather than physical integration Updating data views easier/faster than corresponding physical changes for DB/DW/DM Single Unified Source of Consistent On-Demand Data Access Re-use of consistent rules: data cleansing, business rules, security rules, etc. Common “virtual schema” with unified metadata and definitions Uniform data integration for multiple workflows and down-stream consumers: Data Services to SOA Services ( foundational data support for SOA Strategy ) ETL Reporting & Analytics Web Portals Combine multiple data sources, types & formats into a complete and unified set Structured to unstructured data Internal to external data (including cloud) Operational systems to DWs to Data Marts to Web Applications
  • Picture – free usage from http://freebigpictures.com/clouds-pictures/cloud-sea/
  • AN INTERNET BASED (i.e. Cloud) DATABASE HOSTING SERVICE - A service which hosts the users’ database on a remote system that can be accessed via the internet. The database support security and management is performed by the service vendor. It may or may not present a virtualized data view. It may require user software in addition to an internet browser to access the data. Some Vendors: SimpleDB – NoSQL (Amazon) ClearDB (CloudDB) CouchDB – NoSQL (CouchOne) Xeround - Sql*Server (Amazon) AppEngine (Google)\\ Database.com AN INTERNET BASED (i.e. Cloud) HOSTING SERVICE - A database independent service which presents the users’ database(s) to internet users.. The database support security and management is performed by customer. The tool may or may not present a virtualized data view. It may require user software in addition to an internet browser to access the data.
  • Cloud Database as a Generic Database Access Service This definition of a Cloud database is one where a database access service connects users with databases hosted on the internet (i.e. Cloud based databases). These databases may be hosted by the vendor running the access site or by others. Generally the access follows a format wherein the user does not need to know the database format nor method in which the database is implemented. SaaS – Software as a Service IasS – Infrastructure as a Service or Information as a Service PaaS – Platform as a Service
  • The Forrester Wave™: Data Virtualization, Q1 2012 Informatica, IBM, Composite Software, And Denodo Technologies Lead, With SAP, Microsoft, Oracle, Stone Bond, And Red Hat Close Behind by Noel Yuhanna, Mike Gilpin with Adam Knoll http://www.forrester.com/search?#/The+Forrester+Wave+Data+Virtualization+Q1+2012/quickscan/-/E-RES60746 ------------------------------------------------------------------------------------------------------ Article Abstract: In Forrester's 53-criteria evaluation of data virtualization — also known as information-as-a-service (IaaS) — vendors, we found that Informatica, IBM, Composite Software, and Denodo Technologies lead the pack because of strong enterprise-class data virtualization features and functionality such as real-time integration, data quality, transformation, caching, and modeling. SAP, Microsoft, Oracle, Stone Bond Technologies, and Red Hat are Strong Performers; each offers a viable option to support particular use cases. Although Oracle no longer positions itself in the data virtualization market, Forrester evaluated its solution as a nonparticipating vendor. Red Hat continues as the most substantial vendor supporting an open source data virtualization solution. This market has sufficiently matured in that its Leaders include large well-established platform vendors, but it continues to exhibit important innovation from smaller players more exclusively focused on information virtualization and federation.
  • An Eye on the Future A Review of Data Virtualization Techniques to Improve Research Analytics RICHTER

    1. 1. An eye on the future: A review of DataVirtualization Techniques to improve researchPresenter: Jack A RichterAuthors: Jack A Richter, Lela McFarland, Christine Bredfeldt, Ph.DContributors: Rajesh V DevKaran (KPIT), Wayne Little (KPIT), Sriram Thiruvenkatachari (KPIT), Sean Mikha (Teradata), Steven C Werntz (KPGA) .,MID-ATLANTIC PERMANENTE RESEARCH INSTITUTE
    2. 2. What is “Data Virtualization”? "Data virtualization is the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology."— "Clearly Defining Data Virtualization, Data Federation and Data Integration“Rick Van der Lans - Industry analyst and author specializing in DW, SOA, and Database technology The Hague, The Netherlands - Dec 16, 2010 2 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc. For internal use only.
    3. 3. Data Virtualization Virtualized Data Data Virtualization Technique Teradata Flat Files, Spreadsheets, etc. Oracle 3rd Party Data Sources SQL/Server Enterprise DW Source: Data Virtualization Use Cases 12-23-2011.docx By Wayne Little and Rajesh V DevKaran3 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc.
    4. 4. Enterprise Data Warehouse (EDW)  All pertinent data moved to one physical database Enterprise Data Warehouse (EDW) Multiple Custom ETL Processes Claims Lab Radiology Membership Pharmacy HR, Payroll, etc. Scheduling CRM Etc.4 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc. For internal use only.
    5. 5. Data Marts – A Subset of the EDW  Data Subject area consolidated in one database Membership and Utilization Data Mart Multiple Custom ETL Processes Claims Lab Radiology Membership Pharmacy Oncology Micro Labs Scheduling CRM Multiple Custom ETL Processes Lab Data Mart5 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc. For internal use only.
    6. 6. Distributed Database Links and Synonyms  Create links between databases with synonyms so data can be accessed and used between databases without knowledge of the source database. Membership and Utilization Data Mart (Views) Link Definitions with synonyms Claims Lab Radiology Membership Pharmacy Oncology Micro Labs Scheduling CRM Link Definitions with synonyms Claims Data Mart (Views)6 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc. For internal use only.
    7. 7. SQL or Tool “Views”  Views – Creating a “view” to the data Patient Encounters Patient Claims Procedures PATIENT_PROCEDURES (View) Procedure Definition Table Back Office Procedures Encounter Procedures7 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc. For internal use only.
    8. 8. A Data Virtualization Diagram Web Application/ Browser Cognos SOA Business Objects SQL Clients SAS Mobile Clients/Apps Virtual Schema Data Virtualization Tool Teradata Flat Files, Spreadsheets, etc. Oracle 3rd Party Data Sources SQL/Server Enterprise DW Source: Data Virtualization Use Cases 12-23-2011.docx By Wayne Little and Rajesh V DevKaran8 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc.
    9. 9. The Forecast for Databases is Partly Cloudy What is a “Cloud Database” ?  Database Hosting Service accessible via the Internet  An Internet Service that connects users to Databases accessible via the internet.9 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc. For internal use only.
    10. 10. Cloud Database as a Hosting Service Public or Private Approved User or Application Internet Cloud LOC Access Point Access Point - Virtual Schema or Pass through direct to databases SQL/Server Library of Congress Cloud Database Teradata Hosting System Oracle Your Cloud Hosting System Enterprise DW Databases Flat Files, Spreadsheets, etc.10 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc.
    11. 11. Cloud Database as a Database Access Service Public or Private Approved User or Application Internet Cloud LOC Access Point Your Virtual Generic Access Schema Point SQL/Server Library of Congrwess Cloud Database Teradata Hosting System Generic Vendor Your Cloud Hosting Cloud Database Oracle System Access System Enterprise DW Databases Flat Files, Spreadsheets, etc. Business B Business A database database11 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc.
    12. 12. Special Thanks to:  Kaiser Permanente IT – Rajesh V. DevKaran  Consultant Specialist, Analytic Solutions & Standards, Enterprise Architecture/Information Management – Sriram Thiruvenkatachari  Principal Solutions Consultant, Infrastructure Mgt Group, Systems Integration – Wayne Little  Principal Information Architect, Information Architecture and Standards Mgt, Enterprise Architecture/Information Management  Kaiser Permanente Georgia – Steven C. Werntz  Manager, Business Intelligence Solutions, KPGA  Teradata – Sean Mikha  Solution Architect, Finance, HealthCare and Insurance (FIH)12 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc. For internal use only.
    13. 13. Contacts, Vendors and Links  Contacts for this presentation – Jack Richter – Jack.A.Richter@kp.org – Lela L McFarland – Lela.L.McFarland@kp.org – Christine E Bredfeldt – Christine.E.Bredfeldt@kp.org  Some Vendors & Links – Data Virtualization  Vendor- Composite Software, Inc. - http://www.compositesw.com/  Vendor - Denodo Technologies, Inc. - http://www.denodo.com/en/index.php  Vendor – IBM - http://www.ibm.com/search/csass/search?sn=mh&q=cloud%20database&lang=en&cc=us&en=utf  Vendor – Informatica Corporation - http://www.informatica.com/us/products/data-virtualization/  White Paper – Data Virtualization reaches Critical Mass - http://purl.manticoretechnology.com/ImgHost/582/12917/2011/resources/white_papers/DataVirtualizationReachesCriticalMass.pdf13 May 2, 2012 | © 2011 Kaiser Foundation Health Plan, Inc. For internal use only.

    ×