Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ACCELERATING RESEARCH &
DEVELOPMENT WITH DV
Scott Harker, Technical Services Lead
GlaxoSmithKline, R&D IT
Agenda
¨  Early days – issues
¨  Rehabilitating the image
¨  Use cases
¨  Data Virtualization service
¨  Entity-centr...
Issues – Misuse and Misunderstanding
¨  Initial attempts to use CIS 5.1 failed
¤  Didn’t consume complex web services we...
Rehabilitating the Image
¨  Agreement as standard EII solution
¤  Had existing perpetual licenses
¨  Pilot with 6.1
¤ ...
Use Case 1: Socrates
¨  Two independent uses of CIS
¨  Typical “tell me everything about x” application
¤  Integrates 1...
Use Case 2: EMT
¨  Federation and insulation layer
1.  Originally web service reading Oracle database
2.  Put CIS between...
Data Virtualization Service
¨  Remit: Provide projects and data analysts with well-defined
virtual data entities delivere...
Warehousing or Federation (or Both)?
Low (needs cleansing) High (usable as-is)Data Quality
Warehousing Federation
Large (n...
Four Disciplines
Consultation
•  Assess architectural fit
•  Estimate effort
•  Advise on
requirements
•  Train project
de...
Entity-centric Design
¨  Temptation to “wire through”
¤  Connect to sources, integrate, publish service
¤  Repeat for n...
Building Project by Project
¨  Not proactively building data entity layer
¤  Hard to justify, not sure getting most impo...
Questions?
Thank You
Upcoming SlideShare
Loading in …5
×

Accelerating Research & Development with Data Virtualization

1,088 views

Published on

GSK is one of the world's leading research-based pharmaceutical and healthcare companies. With 13,000 employees in research and development, the IT team must be agile in their ever-changing data management environment. Data virtualization offers the core capabilities needed for greater visibility into research and development information from disparate systems and agile management of complex data. Scott Harker of GSK will share data virtualization use cases that have shown tremendous value to the business. He will share insights on lessons learned, success factors, and the next steps in this evolution that align with the company mission - to help people do more, feel better, live longer.
Learn more: http://www.cisco.com/c/en/us/products/cloud-systems-management/data-analytics/index.html.

Published in: Technology
  • Be the first to comment

Accelerating Research & Development with Data Virtualization

  1. 1. ACCELERATING RESEARCH & DEVELOPMENT WITH DV Scott Harker, Technical Services Lead GlaxoSmithKline, R&D IT
  2. 2. Agenda ¨  Early days – issues ¨  Rehabilitating the image ¨  Use cases ¨  Data Virtualization service ¨  Entity-centric design
  3. 3. Issues – Misuse and Misunderstanding ¨  Initial attempts to use CIS 5.1 failed ¤  Didn’t consume complex web services well ¤  Didn’t publish complex web services ¤  Cache ≠ fast, lightweight ETL ¤  JChem cartridge for Oracle didn’t work ¤  No clean integration with Sharepoint ¨  Led to a reputation problem
  4. 4. Rehabilitating the Image ¨  Agreement as standard EII solution ¤  Had existing perpetual licenses ¨  Pilot with 6.1 ¤  Performance, publishing web services, SAML, JChem, optimizations ¤  Good reports on all fronts ¨  EA identified two good use cases ¨  Full-time SME assigned
  5. 5. Use Case 1: Socrates ¨  Two independent uses of CIS ¨  Typical “tell me everything about x” application ¤  Integrates 15 Oracle, 1 SQL Server, and 1 Excel ¤  Presents Google-like interface ¤  Search on protein, compound, gene, target, … ¨  Support for text mining ¤  Separate tool analyzes documents from 5 systems ¤  Tool produces output data for each ¤  CIS federates data to be read by search tool
  6. 6. Use Case 2: EMT ¨  Federation and insulation layer 1.  Originally web service reading Oracle database 2.  Put CIS between them and added SQL database 3.  Added another Oracle database 4.  Migrated data from SQL to 2nd Oracle 5.  Removed SQL 6.  Plan to replace remaining Oracles with other tool ¨  After 2, no changes to web service
  7. 7. Data Virtualization Service ¨  Remit: Provide projects and data analysts with well-defined virtual data entities delivered on a stable, high-performance platform ¨  Four disciplines: Consultation, Development, Support, and Operations ¨  Consult whether to use virtualization or not ¨  Development by service or project developers ¨  Specialized skills for support ¨  Entity-centric approach to provisioning
  8. 8. Warehousing or Federation (or Both)? Low (needs cleansing) High (usable as-is)Data Quality Warehousing Federation Large (no WHERE clause) Small (very selective)Result Sets No problem standing up new DB’s New DB’s are unwelcomeFootprint Limits Data can be rehosted freely Restrictions on moving data (e.g. GDPR)Data Protection Schema does not change often Many new tables/views/columns, type changesMetadata New data sources are rare Often adding new sourcesLandscape Solution not needed urgently Need solution right awayTime to Solution Data sources often unavailable Sources highly availableData Availability Data sources under heavy load Sources not heavily usedSource Load Don’t need latest data immediately Always want freshest dataData Freshness
  9. 9. Four Disciplines Consultation •  Assess architectural fit •  Estimate effort •  Advise on requirements •  Train project developers •  Design Info Blueprint layer Development •  Gather requirements •  Design & develop solution •  Engage data source owners •  Engage support groups •  Perform knowledge transfer •  Support during warranty Support •  Engage with development team throughout project •  Receive knowledge transfer and confirm completeness •  Handle incidents and problems, working with app and data owners •  Handle planned changes, working with app and data owners Operations •  Monitor servers and software •  Adjust configuration as needed •  Scale footprint to meet demand •  Manage access •  Promote to Test, deploy to Prod •  Upgrade/patch CIS regularly
  10. 10. Entity-centric Design ¨  Temptation to “wire through” ¤  Connect to sources, integrate, publish service ¤  Repeat for next service, integrate slightly differently ¤  Same data entity looks different each time ¨  Need single version of the truth ¤  Stable entity definitions – services combine differently ¨  Allows us to change the conversation ¤  Old req’s: “I need to integrate data from A, B, and C” ¤  New req’s: “I need to integrate X’s with Y’s and Z’s”
  11. 11. Building Project by Project ¨  Not proactively building data entity layer ¤  Hard to justify, not sure getting most important entities ¨  Project-based approach ¤  Project 1 is new top-to-bottom, includes some entities ¤  At close, move entities into common repository ¤  Project 2 reuses some entities, adds some more locally ¤  At close, augment and add to entities in repository ¤  … ¨  Repository grows and includes important entities
  12. 12. Questions? Thank You

×