1
Harnessing Your Hybrid Data
Ecosystem
Unleashing the power of data with data virtualization.
Lakshmi Randall, Twitter:@LakshmiLJ
Director of Product Marketing
July 2017
2
Multi-Platform Architecture
Reality of Modern Enterprise
Diverse Governance and Metadata Needs Diverse Ingestion and Integration Needs
Diverse SkillsetsDiverse Data Architectures
Batch Real-time Continuous
Right-time
CloudData lakes DW
Data Hub Distributed
On-demand
Local
Centralized
Metadata
Local
Metadata
Metadata
Exchange
Local
Governance
Centralized
Governance
3
HDE comprises multivarious data, processes and technologies that enable
enterprises to optimally harness insights
Hybrid Characteristics
 Legacy & Modern
 Multi-Platform
 Distributed Architectures
 Batch & Real-time
 Structured & Unstructured
 Cloud & On Premises
 Open Source & Commercial
 Diverse Data
 Domain-specific Views
Disparate Data Sources
Hybrid Data Ecosystem
Most data warehouses are now
multi-platform hybrid architectures.
Source: 2014 TDWI report “Evolving Data Warehouse
Architectures.” Based on 538 respondents.
Other
(2%)
No true EDW, but
many workload-
specific data
platforms instead
Many workload-specific
data platforms w/non-
central EDW
Central EDW
with many
additional data
platforms
Central EDW with a
few additional data
platforms
Central
monolithic EDW
with no other
data platforms
15%15%16%37%15%
EDW
DWE
Multi-platform
hybrid is the new
norm.
Monolith was
norm in ‘90s;
now rare.
5
BENEFITS
• Enables business goals
• Flexibility to support data
diversity
• Cost optimization opportunities
• Supports prototyping of new
business models
• Multiple Systems of Insight
CHALLENGES
• Data Ownership
• Integration and Unification
• Data Quality Risks
• Skillset Scarcity
• Optimization Issues
• Multiple data models
• Lack of Holistic View
• Multiple Local Architectures
Benefits and Challenges of HDEs
6
Harnessing Insights from HDEs
Costs of Complexity:
“Just Because It’s Difficult To Quantify, Doesn’t Mean It’s
Zero!(But That’s How It’s Often Treated!)”
The Hands-on Group
One or more architectures or layers must unify the disparate systems
and data assets of the HDE to understand and mask the HDE’s
complexity
Achieving technical cohesion and business value in a multi-platform environment
7
The Solution – A Data Abstraction Layer
Abstracts access to
disparate data sources
Acts as a single repository
(virtual)
Makes data available to
consumers in real-time
“Enterprise architects must revise their data
architecture to meet the demand for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
DATA ABSTRACTION LAYER
888
Five Essential Capabilities of Data Virtualization
1. Data abstraction
2. Zero replication, zero relocation
3. Real-time information
4. Self-service data services
5. Centralized metadata, security
& governance
999
1.Data abstraction
Abstracts access to disparate data sources.
Acts as a single virtual repository.
Abstracts data complexities like location, format,
protocols
…hides data complexity for ease of data access by business
Enterprise architects must revise their data architecture to meet
the demand for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research
101010
2.Zero replication, zero relocation
…reduces development time and overall TCO
The Denodo Platform enables us to build and deliver data services, to
our internal and external consumers, within a day instead of the 1 – 2
weeks it would take with ETL.”
– Manager, DrillingInfo
Leaves the data at its source; extracts only what is needed,
on demand.
Diminishes the need for effort-intensive ETL processes.
Supports transformations and quality functions without the
latency, redundancy, and rigidity of legacy approaches.
111111
3.Real-time information
Provisions data in real-time to consumers
Creates real-time logical views of data across many data
sources.
Supports transformations and quality functions without the
latency, redundancy, and rigidity of legacy approaches
…enables timely decision-making
Data virtualization integrates disparate data sources in real time or near-real time
to meet demands for analytics and transactional data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester
Research, Dec 16, 2015
121212
4. Self-service data services
Facilitates access to all data, both internal and external
Enables creation of universal semantic models reflecting business
taxonomy
Connects data silos to provide best available information to drive
business decisions
…enables information discovery and self-service
Impressively quick turn around time to "unlock“ data from additional siloes and
from legacy systems - Few vendors (if any) can compete with Denodo's support
of the Restful/Odata standard - both to provide data (northbound) and to
access data from the sources (southbound).”
– Business Analyst, Swiss Re
131313
5. Centralized metadata, security & governance
Abstracts data source security models and enables single-point security and
governance.
Extends single-point control across cloud and on-premises architectures
Provides multiple forms of metadata (technical, business, operational) to
facilitate understanding of data.
…simplifies data security, privacy, audit
Our Denodo rollout was one of the easiest and most successful rollouts of critical enterprise
software I have seen. It was successful in handling our initial, security, use case immediately,
and has since shown a strong ability to cover additional use cases, in particular acting as a Data
Abstraction Layer via it's web service functionality.”
– Enterprise Architect, Asurion
141414
Definition
-Source: “Gartner Market Guide for data virtualization – 2016”
Data virtualization technology can be used to create virtualized and integrated
views of data in memory (rather than executing data movement and physically
storing integrated views in a target data structure), and provides a layer of
abstraction above the physical implementation of data.”
Data Virtualization Reference Architecture
15
16
The Role of Data Virtualization in HDEs
• Enable an Integrated Data Ecosystem
• Improve Business Agility & Productivity
• Provide Virtualized Views of HDE
• Access data instead of replicating and consolidating as appropriate
• Centralize Metadata and Governance Policies for a HDE
• Optimize and Manage data access to a HDE
• Minimize skillset challenges in a HDE
• Provision business-ecosystem-specific views from HDEs
17
HDE: Three Perspectives
HDE comprises multivarious data, processes and technologies that enable
enterprises to optimally harness insights
Integrated
Supply Chain
Multi-channel
Marketing
Financial RiskQuality Control
Business
Perspective
Local & Centralized
Governance
Hybrid Characteristics
 Legacy & Modern
 Multi-Platform
 Distributed
Architectures
 Batch & Real-time
 Structured &
Unstructured
 Cloud & On Premises
 Open Source &
Commercial
 Diverse Data
 Domain-specific ViewsEnterprise
Perspective
Common
Data Models
Data Reuse
Technical
Perspective Disparate Data Sources
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel ,PDF, Word...
Shared
Metadata
Data
Ownership
18
Vizient’s HDE – Technical Perspective
-Chuck DeVries VP Architecture and Development Vizient
The Denodo Platform will provide 350% ROI over 5 years and break
even within 1.5 years of our initial project and will continue to deliver
additional savings every year. Further, we plan to leverage the platform
in our data lake project.”
20
Risk Data Ecosystem (RDE) – Business Perspective
Risk Systems Integration using Data Virtualization
Risk areas: financial (credit, liquidity, ...), market and
operations
RDE Delivers aggregation and internal reporting of
risk data that is more timely, accurate,
comprehensive and granular;
 Highly automated aggregation of risk data by
business line, region, asset type, industry, legal
entity.
 Adaptable and flexible process for ad hoc
requests.
 Higher standards for reporting practices: reports
are accurate, reconciled, validated; tailored to the
audience and context
Virtual risk views across bank
21
Data Marketplace - Enterprise Perspective
Enterprise Data Service Registry
Virtual Data Layer
Scheduling
& Delivery
Reuse Data
Services
Virtual Operational
Data Stores
Virtual Data
Marts
Usage StatsMeta Data
RDBMSNoSQLBig Data Web ServicesPackaged
App
Files
Enterprise Data
Marketplace
BI, CPM and
Reporting
Portal &
Dashboards
Applications
BUSINESS
SOLUTIONS
Access Information-
as-a-Service
ENTERPRISE DATA
SERVICE
REGISTRY
Standard metadata
and enterprise data
services
DATA
VIRTUALIZATION
Abstract layer for
data services
DISPARATE DATA
Any source
Any format
Data Virtualization
Use Cases
Company confidential – do not forward or distribute
23
Customer Centricity / MDM
 Complete View of Customer
Data Services
 Data as a Service
 Data Marketplace
 Data Services
 Application and Data Migration
Cloud Solutions
 Cloud Modernization
 Cloud Analytics
 Hybrid Data Fabric
Data Governance
 GRC
 GDPR
 Data Privacy / Masking
BI and Analytics
 Self-Service Analytics
 Logical Data Warehouse
 Enterprise Data Fabric
Big Data
 Logical Data Lake
 Data Warehouse
Offloading
 IoT Analytics
Denodo ‘Solution’ Categories
Company confidential – do not forward or distribute
24
Denodo ‘Solution’ Categories
Customer Centricity / MDM
 Complete View of Customer
Data Services
 Data as a Service
 Data Marketplace
 Data Services
 Application and Data Migration
Cloud Solutions
 Cloud Modernization
 Cloud Analytics
 Hybrid Data Fabric
Data Governance
 GRC
 GDPR
 Data Privacy / Masking
BI and Analytics
 Self-Service Analytics
 Logical Data Warehouse
 Enterprise Data Fabric
Big Data
 Logical Data Lake
 Data Warehouse
Offloading
 IoT Analytics
Customer Centricity/MDM
 Complete View of Customer
 Customer Service Unified Desktop
 Unified Desktop for Contact Center
 Customer Self-Service Portal
 Single Customer View for Back Office
Automation
Company confidential – do not forward or distribute
25
Customer Centricity / MDM
 Complete View of Customer
Data Services
 Data as a Service
 Data Marketplace
 Data Services
 Application and Data Migration
Cloud Solutions
 Cloud Modernization
 Cloud Analytics
 Hybrid Data Fabric
Data Governance
 GRC
 GDPR
 Data Privacy / Masking
BI and Analytics
 Self-Service Analytics
 Logical Data Warehouse
 Enterprise Data Fabric
Big Data
 Logical Data Lake
 Data Warehouse
Offloading
 IoT Analytics
Denodo ‘Solution’ Categories
Data Governance
 GRC
 Data Retention for Regulatory Compliance
 Risk Reporting for Basel III Compliance
 Single View of Risk
 GDPR
 Data Privacy and Protection
 Data Privacy/Masking
 Data Privacy in a Hybrid Environment
 De-identifying Patient Data according to
HIPAA Safe Harbor Rules
Company confidential – do not forward or distribute
26
Customer Centricity / MDM
 Complete View of Customer
Data Services
 Data as a Service
 Data Marketplace
 Data Services
 Application and Data Migration
Cloud Solutions
 Cloud Modernization
 Cloud Analytics
 Hybrid Data Fabric
Data Governance
 GRC
 GDPR
 Data Privacy / Masking
BI and Analytics
 Self-Service Analytics
 Logical Data Warehouse
 Enterprise Data Fabric
Big Data
 Logical Data Lake
 Data Warehouse
Offloading
 IoT Analytics
Denodo ‘Solution’ Categories
Data Services
 Data as a Service
 Data Services for Drug Discovery
 Unified Data Services Layer
 Enterprise Data Service Layer
 Data Marketplace
 Data Access Marketplace
 Liquidity Management Dashboard
 Data Services
 Cable Set Top Box Transaction Management
 RESTful Web Services API for Development
Teams
 Application and Data Migration
 Migration Abstraction Layer
 Mergers and Acquisitions
Company confidential – do not forward or distribute
27
Denodo ‘Solution’ Categories
Customer Centricity / MDM
 Complete View of Customer
Data Services
 Data as a Service
 Data Marketplace
 Data Services
 Application and Data Migration
Cloud Solutions
 Cloud Modernization
 Cloud Analytics
 Hybrid Data Fabric
Data Governance
 GRC
 GDPR
 Data Privacy / Masking
BI and Analytics
 Self-Service Analytics
 Logical Data Warehouse
 Enterprise Data Fabric
Big Data
 Logical Data Lake
 Data Warehouse
Offloading
 IoT Analytics
BI and Analytics
 Self-Service Analytics
 Self-Service Discovery
 Self-Service Exploration
 Self-Service Collaboration
 Logical Data Warehouse
 Inventory-Sales Reconciliation Reports
 Logical Data Warehouse
 Agile Reporting using Logical Data
Warehouse
 Enterprise Data Fabric
 Single View of Supply Chain
 Secure Data Services Layer
Company confidential – do not forward or distribute
28
Denodo ‘Solution’ Categories
Customer Centricity / MDM
 Complete View of Customer
Data Services
 Data as a Service
 Data Marketplace
 Data Services
 Application and Data Migration
Cloud Solutions
 Cloud Modernization
 Cloud Analytics
 Hybrid Data Fabric
Data Governance
 GRC
 GDPR
 Data Privacy / Masking
BI and Analytics
 Self-Service Analytics
 Logical Data Warehouse
 Enterprise Data Fabric
Big Data
 Logical Data Lake
 Data Warehouse
Offloading
 IoT Analytics
Big Data
 Logical Data Lake
 Single View for Customer Analytics
 Data Warehouse Offloading
 Cost Reduction
 IoT Analytics
 Contextual Data for Advanced Analytics
Company confidential – do not forward or distribute
29
Denodo ‘Solution’ Categories
Customer Centricity / MDM
 Complete View of Customer
Data Services
 Data as a Service
 Data Marketplace
 Data Services
 Application and Data Migration
Cloud Solutions
 Cloud Modernization
 Cloud Analytics
 Hybrid Data Fabric
Data Governance
 GRC
 GDPR
 Data Privacy / Masking
BI and Analytics
 Self-Service Analytics
 Logical Data Warehouse
 Enterprise Data Fabric
Big Data
 Logical Data Lake
 Data Warehouse
Offloading
 IoT Analytics
Cloud Solutions
 Cloud Modernization
 Application Modernization
 Cloud Migration
 Cloud Analytics
 Analytics in the Cloud
 Web/Cloud/Semi-Structured Data
Integration
 Hybrid Data Fabric
 Single View of Customer for Distributor
Portal
 Automation of Service Interaction for
Retail Partner Customers
30
Going Forward
 Web-based Information Self-Service
• Advanced data catalog enables a centralized “data marketplace”
• Keyword base search
• Collaboration (tags, comments, annotations, request for access, etc.)
 Next-gen “Fabric” Execution Engine
• Tighter integration with in-memory and data grids to move processing from the
virtual layer to specialized execution engines
 Holistic Operations Console
• Common operations web console to orchestrate monitoring, notifications,
diagnosis, auditing, migration, license management, etc.
What’s cooking in the virtualization space
31
Summary
• HDE is inevitable in modern enterprises - Embrace the diversity.
• Ensure your HDE evolution is driven by business goals
• Virtualize Data, don’t Migrate or Consolidate It
• Leverage Data Virtualization to understand, access, unify, govern,
and model your data in a HDE.
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical,
including photocopying and microfilm, without prior the written authorization from Denodo Technologies.
July 2017
Who Are We
One of the world's largest independent exploration and production companies.
Committed to Health, Safety and Environment.
Over 4,000 employees worldwide.
Committed to its Core Values of: Integrity and Trust, Servant Leadership, People and
Passion, Commercial Focus, Open Communication.
An integral part of the communities where we live, work and operate.
Recognized among the World's Most Innovative Companies by Forbes in 2012.
34
Business Need
Data - Access to Critical Information to Support Business Processes
Better – access to complete information
More – access to related information
Faster – access in real-time
Common Catalog – For enterprise
35
Challenge
Data is Siloed Across Disparate Systems
Manually access different systems
Addressed with point-to-point data
integration
Takes too long to get answers to users
Inadequate security on source systems
36
Challenge
Friction between Business and IT
IT is too slow. Takes too long to build
solutions.
Wrong Data – Obsolete or Stale
Lack of adequate enterprise data
repositories - DW / Data Mart / Data Lake
37
Business Solutions
Temporary Solutions / Scalability
Microsoft Access
Microsoft Excel
Spotfire
38
Solution
Data Abstraction Layer
Abstracts access to disparate data sources
Acts as a single repository (virtual)
Makes data available in real-time to consumers
Integration with AD – Security
39
Data Virtualization – Our Journey
Projects and Timeline
Pilot Project: Jul’2016 - Sep’2016
Full Implementation in Business Unit: Oct’2016 - Feb’2017
Governance Implementation: Jun’2017 - Oct’2017
40
Data Virtualization
Our Architecture
Hardware
Virtual Server - Windows 2012 R2
Specs: 64-bit, 4GB - 18 GB Ram, 4 CPUs
Denodo 6.0
Database Approach
ADMIN
CORE
ASSET
FUNCTIONAL 41
Data Virtualization
Use Case # 1 – Industry Subscription
Data in the Cloud
Problem
Provide consistent and up-to-date access to purchased industry subscription data for data mining
 Multiple vendors
 Multiple data types (Well Locations, Oil and Gas Production Volumes, M&A Activity)
 Multiple access protocols (Azure SQL database, hosted XML files, external JSONREST web
services)
Honor internal and external security requirements and ensure adequate performancecost
 Prevent sharing usernames and passwords
 Leverage (internal) enterprise security infrastructure
 Provide metricsaudit on usage
 Limit access as specified in agreements
 Avoid time and cost of standing up additional databases
Solution 42
Data Virtualization
Use Case # 2 – Logical Data Mart for
Key Business Unit
Problem
Significant organization changes due to market conditions surfaced several point solutions driving
critical business processes
Reduce unnecessary copies from corporate data stores into local stores that stagnant quickly and are
difficult to support (e.g. multiple, duplicated mini data marts in Excel and Access)
Need ability to combine augmented or rapidly changing business unit specific data with corporate
data
Solution
Leveraged newly formed data and analytics team in business unit(s) to provide centralized support
Partnered with corporate teams to develop managed data delivery environment (tools + process)
Built logical data mart (i.e. virtual database) to combine BU-specific and corporate data 43
Data Virtualization
Use Case # 3 – Streamline Well Summary and Production Data Retrieval
Problem
Needed to combine multiple data types (well header, production volumes, well spacing, forecast)
from disparate systems
Many manual processes used to update data set resulted in time-consuming process
Reports ran very slowly
Use of Spotfire for integration prevented reports from being run by other reporting tools.
Solution
Integrated data from disparate data sources into a few views
Was able to integrate Excel workbooks into the solution 44
Data Virtualization
Our Observations
Better Collaboration between Business and IT
Build solutions faster
More involvement from Data Source owners
45
Data Virtualization
Our Observations
More access to data
 Ability to expose data in multiple ways (ODBC, JDBC, OData)
 Combine data in new ways from different sources
 Ability to access non-traditional data sources (e.g. SharePoint, web services,
multi-dimensional)
 Make the data sources all look like they reside in the same database
Better access to data
 Pick data from the best sources to incorporate into a mash-up view
 Find source-of-record information in a central, documented location
 Access by going directly to the source (instead of a copy) 46
Producers
View Designers
 Techs and Power Users will be trained by IT
and/or train the trainer approach
 Techs and Power Users from each asset or
functional area will build virtual views
 Views will merge asset specific virtual views
with global asset views
 Techs are hands on daily with individual
assets giving them a deeper understanding
of what each asset needs
IT
 Train business users on use and best
practices
 Build global virtual views that can be used
Consumers
Data Catalog (TBD)
 Web-based tool for viewing metadata
 Ability to request access and connection
info
Applications
Protocols
 ODBC
 JDBC
47
How are views to data created and accessed?
ToadTM Data Point
Recommendations - Base Cases for Use
Combine data from multiple sources in real-time
Source systems are highly available
Access different types of data: structured (DBMS), semi-structured (XLS),
unstructured (PDF, Web), web services
Simple data cleansing and less complex transformations
48
Key Discoveries
May not duplicate all functionality in every client tool (Spotfire, Excel, Access)
You are only as fast as your slowest data source
Pass-thru security is difficult
You can connect to almost anything (whether or not you should)
Change Management is a challenge with the current version, 6.0
Involve source system owners in early stages
ETL may be the best solution in some cases
Integration with SAP BW - Possible but performance is a challenge
Not intended for aggregations of large data sources in real-time 49
What’s next?
Business Unit Roll-out
 Rolling out governance
 Implementing metrics end of Sep’2017
 Data Catalog
Enterprise Roll-out
 Planned for Oct’2017
50
July 13, 2017
Infosys-Noah Consulting
Industry
Experience
Information
Managemen
t
Operations
Focused
Domain
Expertise
Average 25+ years of industry experience in
Information Management disciplines
Library of project Accelerators honed to meet
specific industry needs
Extensive experience providing solutions to the
largest and most complex companies in the world
Specific Information Management Strategy and
Implementation Methodologies
Industry Thought Leadership in MDM, Data
Quality, Metadata, Data Virtualization
52
Why Consider Data Virtualization?
Improve business agility
Reduce latency
Provide high quality, in context data
End user self-service
Ease of change
Enable enterprise / cross-BU data integration
Access immovable data
Lower TCO
53
Leveraging DV to unlock
information to accelerate and
improve business performance.
Data Virtualization use cases
Page
| 54
This is the primary
Use Case, using DV
to create a virtual
data warehouse for
reporting and
analytics.
Using DV to
extend/upgrade
existing EDWs would
be a good way to
expand on the value
case of DV.
DV and ETL can work
together to create Virtual
Data Marts on top of the
existing/extended EDW
platform. This use case is
relevant when there is an
existing trusted EDW
system.
DV can be used to
create golden records
on the fly implementing
a Registry MDM. Note
that MDM match and
merge logic can
sometimes be fairly
complex and may not
always be possible to
implement using DV.
1. Reporting & Analytics 2. Extending EDW
4. Registry MDM 3. Virtual Data Marts
Lessons Learned
5
5
Data virtualization doesn’t solve data
quality issuesApply strong data governance to support
your DV approach
 Manage data quality at the source
 Data dictionary and definitions
 Data Stewardship
DV Standards
 Library and naming standards
 Virtualization layers
56
Image from the Data Management Book of Knowledge
(DMBOK) published by Data Management International
(DAMA)
Where is my data?
Know what the authoritative source is for each
attribute
Understand the data lifecycle and how data can change
across SOR
Understand the quality of information within each SOR
Ensure data standards are applied consistently across
each SOR for shared data types
Use of Data Catalog for easy access to information and
metadata about DV Views
Stick to your DV principles
Establish guiding principles on when to use DV versus other methods for data exchange
Understand your non-functional requirements (latency, linage, performance)
Performance considerations
Adding complexity to equation
Do not use for transformations beyond:
 routing data
 re-representing objects (i.e., renaming to
standard model)
 data augmentation (i.e., derived metrics)
Minimize the urge to apply complex
transformation or calculations
59
Extract
Transform
Load
Data
Data
Data
Analyti
cs
Busines
s
Applicati
on
Reporti
ng
General guiding principles
Data Virtualization
Layer
* Source – Data Virtualization book
App Data Stores Files Services
Data WarehousesApplications
BI, Reports Applications Portals, other..
ConsumersProducers
• Use appliances to supercharge performance
• Do not store any material business data
• Perform aggregations in the data layer
• Perform transformations in the ETL layer
• Run data quality tools against the repository to
validate/qualify data
• Pass through I/U/D queries to the transaction source
• Modeling should account for reuse and growth
• Organization to support continuous expansion
Any Questions?
Thank you
James Soos
Associate Partner – E&P Practice Lead
Mobile: 936-499-8441
James.Soos@noah-consulting.com
James.Soos@infoys.com

Fast Data Strategy Houston Roadshow Presentation

  • 1.
    1 Harnessing Your HybridData Ecosystem Unleashing the power of data with data virtualization. Lakshmi Randall, Twitter:@LakshmiLJ Director of Product Marketing July 2017
  • 2.
    2 Multi-Platform Architecture Reality ofModern Enterprise Diverse Governance and Metadata Needs Diverse Ingestion and Integration Needs Diverse SkillsetsDiverse Data Architectures Batch Real-time Continuous Right-time CloudData lakes DW Data Hub Distributed On-demand Local Centralized Metadata Local Metadata Metadata Exchange Local Governance Centralized Governance
  • 3.
    3 HDE comprises multivariousdata, processes and technologies that enable enterprises to optimally harness insights Hybrid Characteristics  Legacy & Modern  Multi-Platform  Distributed Architectures  Batch & Real-time  Structured & Unstructured  Cloud & On Premises  Open Source & Commercial  Diverse Data  Domain-specific Views Disparate Data Sources Hybrid Data Ecosystem
  • 4.
    Most data warehousesare now multi-platform hybrid architectures. Source: 2014 TDWI report “Evolving Data Warehouse Architectures.” Based on 538 respondents. Other (2%) No true EDW, but many workload- specific data platforms instead Many workload-specific data platforms w/non- central EDW Central EDW with many additional data platforms Central EDW with a few additional data platforms Central monolithic EDW with no other data platforms 15%15%16%37%15% EDW DWE Multi-platform hybrid is the new norm. Monolith was norm in ‘90s; now rare.
  • 5.
    5 BENEFITS • Enables businessgoals • Flexibility to support data diversity • Cost optimization opportunities • Supports prototyping of new business models • Multiple Systems of Insight CHALLENGES • Data Ownership • Integration and Unification • Data Quality Risks • Skillset Scarcity • Optimization Issues • Multiple data models • Lack of Holistic View • Multiple Local Architectures Benefits and Challenges of HDEs
  • 6.
    6 Harnessing Insights fromHDEs Costs of Complexity: “Just Because It’s Difficult To Quantify, Doesn’t Mean It’s Zero!(But That’s How It’s Often Treated!)” The Hands-on Group One or more architectures or layers must unify the disparate systems and data assets of the HDE to understand and mask the HDE’s complexity Achieving technical cohesion and business value in a multi-platform environment
  • 7.
    7 The Solution –A Data Abstraction Layer Abstracts access to disparate data sources Acts as a single repository (virtual) Makes data available to consumers in real-time “Enterprise architects must revise their data architecture to meet the demand for fast data.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015 DATA ABSTRACTION LAYER
  • 8.
    888 Five Essential Capabilitiesof Data Virtualization 1. Data abstraction 2. Zero replication, zero relocation 3. Real-time information 4. Self-service data services 5. Centralized metadata, security & governance
  • 9.
    999 1.Data abstraction Abstracts accessto disparate data sources. Acts as a single virtual repository. Abstracts data complexities like location, format, protocols …hides data complexity for ease of data access by business Enterprise architects must revise their data architecture to meet the demand for fast data.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research
  • 10.
    101010 2.Zero replication, zerorelocation …reduces development time and overall TCO The Denodo Platform enables us to build and deliver data services, to our internal and external consumers, within a day instead of the 1 – 2 weeks it would take with ETL.” – Manager, DrillingInfo Leaves the data at its source; extracts only what is needed, on demand. Diminishes the need for effort-intensive ETL processes. Supports transformations and quality functions without the latency, redundancy, and rigidity of legacy approaches.
  • 11.
    111111 3.Real-time information Provisions datain real-time to consumers Creates real-time logical views of data across many data sources. Supports transformations and quality functions without the latency, redundancy, and rigidity of legacy approaches …enables timely decision-making Data virtualization integrates disparate data sources in real time or near-real time to meet demands for analytics and transactional data.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
  • 12.
    121212 4. Self-service dataservices Facilitates access to all data, both internal and external Enables creation of universal semantic models reflecting business taxonomy Connects data silos to provide best available information to drive business decisions …enables information discovery and self-service Impressively quick turn around time to "unlock“ data from additional siloes and from legacy systems - Few vendors (if any) can compete with Denodo's support of the Restful/Odata standard - both to provide data (northbound) and to access data from the sources (southbound).” – Business Analyst, Swiss Re
  • 13.
    131313 5. Centralized metadata,security & governance Abstracts data source security models and enables single-point security and governance. Extends single-point control across cloud and on-premises architectures Provides multiple forms of metadata (technical, business, operational) to facilitate understanding of data. …simplifies data security, privacy, audit Our Denodo rollout was one of the easiest and most successful rollouts of critical enterprise software I have seen. It was successful in handling our initial, security, use case immediately, and has since shown a strong ability to cover additional use cases, in particular acting as a Data Abstraction Layer via it's web service functionality.” – Enterprise Architect, Asurion
  • 14.
    141414 Definition -Source: “Gartner MarketGuide for data virtualization – 2016” Data virtualization technology can be used to create virtualized and integrated views of data in memory (rather than executing data movement and physically storing integrated views in a target data structure), and provides a layer of abstraction above the physical implementation of data.”
  • 15.
  • 16.
    16 The Role ofData Virtualization in HDEs • Enable an Integrated Data Ecosystem • Improve Business Agility & Productivity • Provide Virtualized Views of HDE • Access data instead of replicating and consolidating as appropriate • Centralize Metadata and Governance Policies for a HDE • Optimize and Manage data access to a HDE • Minimize skillset challenges in a HDE • Provision business-ecosystem-specific views from HDEs
  • 17.
    17 HDE: Three Perspectives HDEcomprises multivarious data, processes and technologies that enable enterprises to optimally harness insights Integrated Supply Chain Multi-channel Marketing Financial RiskQuality Control Business Perspective Local & Centralized Governance Hybrid Characteristics  Legacy & Modern  Multi-Platform  Distributed Architectures  Batch & Real-time  Structured & Unstructured  Cloud & On Premises  Open Source & Commercial  Diverse Data  Domain-specific ViewsEnterprise Perspective Common Data Models Data Reuse Technical Perspective Disparate Data Sources Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel ,PDF, Word... Shared Metadata Data Ownership
  • 18.
    18 Vizient’s HDE –Technical Perspective
  • 19.
    -Chuck DeVries VPArchitecture and Development Vizient The Denodo Platform will provide 350% ROI over 5 years and break even within 1.5 years of our initial project and will continue to deliver additional savings every year. Further, we plan to leverage the platform in our data lake project.”
  • 20.
    20 Risk Data Ecosystem(RDE) – Business Perspective Risk Systems Integration using Data Virtualization Risk areas: financial (credit, liquidity, ...), market and operations RDE Delivers aggregation and internal reporting of risk data that is more timely, accurate, comprehensive and granular;  Highly automated aggregation of risk data by business line, region, asset type, industry, legal entity.  Adaptable and flexible process for ad hoc requests.  Higher standards for reporting practices: reports are accurate, reconciled, validated; tailored to the audience and context Virtual risk views across bank
  • 21.
    21 Data Marketplace -Enterprise Perspective Enterprise Data Service Registry Virtual Data Layer Scheduling & Delivery Reuse Data Services Virtual Operational Data Stores Virtual Data Marts Usage StatsMeta Data RDBMSNoSQLBig Data Web ServicesPackaged App Files Enterprise Data Marketplace BI, CPM and Reporting Portal & Dashboards Applications BUSINESS SOLUTIONS Access Information- as-a-Service ENTERPRISE DATA SERVICE REGISTRY Standard metadata and enterprise data services DATA VIRTUALIZATION Abstract layer for data services DISPARATE DATA Any source Any format
  • 22.
  • 23.
    Company confidential –do not forward or distribute 23 Customer Centricity / MDM  Complete View of Customer Data Services  Data as a Service  Data Marketplace  Data Services  Application and Data Migration Cloud Solutions  Cloud Modernization  Cloud Analytics  Hybrid Data Fabric Data Governance  GRC  GDPR  Data Privacy / Masking BI and Analytics  Self-Service Analytics  Logical Data Warehouse  Enterprise Data Fabric Big Data  Logical Data Lake  Data Warehouse Offloading  IoT Analytics Denodo ‘Solution’ Categories
  • 24.
    Company confidential –do not forward or distribute 24 Denodo ‘Solution’ Categories Customer Centricity / MDM  Complete View of Customer Data Services  Data as a Service  Data Marketplace  Data Services  Application and Data Migration Cloud Solutions  Cloud Modernization  Cloud Analytics  Hybrid Data Fabric Data Governance  GRC  GDPR  Data Privacy / Masking BI and Analytics  Self-Service Analytics  Logical Data Warehouse  Enterprise Data Fabric Big Data  Logical Data Lake  Data Warehouse Offloading  IoT Analytics Customer Centricity/MDM  Complete View of Customer  Customer Service Unified Desktop  Unified Desktop for Contact Center  Customer Self-Service Portal  Single Customer View for Back Office Automation
  • 25.
    Company confidential –do not forward or distribute 25 Customer Centricity / MDM  Complete View of Customer Data Services  Data as a Service  Data Marketplace  Data Services  Application and Data Migration Cloud Solutions  Cloud Modernization  Cloud Analytics  Hybrid Data Fabric Data Governance  GRC  GDPR  Data Privacy / Masking BI and Analytics  Self-Service Analytics  Logical Data Warehouse  Enterprise Data Fabric Big Data  Logical Data Lake  Data Warehouse Offloading  IoT Analytics Denodo ‘Solution’ Categories Data Governance  GRC  Data Retention for Regulatory Compliance  Risk Reporting for Basel III Compliance  Single View of Risk  GDPR  Data Privacy and Protection  Data Privacy/Masking  Data Privacy in a Hybrid Environment  De-identifying Patient Data according to HIPAA Safe Harbor Rules
  • 26.
    Company confidential –do not forward or distribute 26 Customer Centricity / MDM  Complete View of Customer Data Services  Data as a Service  Data Marketplace  Data Services  Application and Data Migration Cloud Solutions  Cloud Modernization  Cloud Analytics  Hybrid Data Fabric Data Governance  GRC  GDPR  Data Privacy / Masking BI and Analytics  Self-Service Analytics  Logical Data Warehouse  Enterprise Data Fabric Big Data  Logical Data Lake  Data Warehouse Offloading  IoT Analytics Denodo ‘Solution’ Categories Data Services  Data as a Service  Data Services for Drug Discovery  Unified Data Services Layer  Enterprise Data Service Layer  Data Marketplace  Data Access Marketplace  Liquidity Management Dashboard  Data Services  Cable Set Top Box Transaction Management  RESTful Web Services API for Development Teams  Application and Data Migration  Migration Abstraction Layer  Mergers and Acquisitions
  • 27.
    Company confidential –do not forward or distribute 27 Denodo ‘Solution’ Categories Customer Centricity / MDM  Complete View of Customer Data Services  Data as a Service  Data Marketplace  Data Services  Application and Data Migration Cloud Solutions  Cloud Modernization  Cloud Analytics  Hybrid Data Fabric Data Governance  GRC  GDPR  Data Privacy / Masking BI and Analytics  Self-Service Analytics  Logical Data Warehouse  Enterprise Data Fabric Big Data  Logical Data Lake  Data Warehouse Offloading  IoT Analytics BI and Analytics  Self-Service Analytics  Self-Service Discovery  Self-Service Exploration  Self-Service Collaboration  Logical Data Warehouse  Inventory-Sales Reconciliation Reports  Logical Data Warehouse  Agile Reporting using Logical Data Warehouse  Enterprise Data Fabric  Single View of Supply Chain  Secure Data Services Layer
  • 28.
    Company confidential –do not forward or distribute 28 Denodo ‘Solution’ Categories Customer Centricity / MDM  Complete View of Customer Data Services  Data as a Service  Data Marketplace  Data Services  Application and Data Migration Cloud Solutions  Cloud Modernization  Cloud Analytics  Hybrid Data Fabric Data Governance  GRC  GDPR  Data Privacy / Masking BI and Analytics  Self-Service Analytics  Logical Data Warehouse  Enterprise Data Fabric Big Data  Logical Data Lake  Data Warehouse Offloading  IoT Analytics Big Data  Logical Data Lake  Single View for Customer Analytics  Data Warehouse Offloading  Cost Reduction  IoT Analytics  Contextual Data for Advanced Analytics
  • 29.
    Company confidential –do not forward or distribute 29 Denodo ‘Solution’ Categories Customer Centricity / MDM  Complete View of Customer Data Services  Data as a Service  Data Marketplace  Data Services  Application and Data Migration Cloud Solutions  Cloud Modernization  Cloud Analytics  Hybrid Data Fabric Data Governance  GRC  GDPR  Data Privacy / Masking BI and Analytics  Self-Service Analytics  Logical Data Warehouse  Enterprise Data Fabric Big Data  Logical Data Lake  Data Warehouse Offloading  IoT Analytics Cloud Solutions  Cloud Modernization  Application Modernization  Cloud Migration  Cloud Analytics  Analytics in the Cloud  Web/Cloud/Semi-Structured Data Integration  Hybrid Data Fabric  Single View of Customer for Distributor Portal  Automation of Service Interaction for Retail Partner Customers
  • 30.
    30 Going Forward  Web-basedInformation Self-Service • Advanced data catalog enables a centralized “data marketplace” • Keyword base search • Collaboration (tags, comments, annotations, request for access, etc.)  Next-gen “Fabric” Execution Engine • Tighter integration with in-memory and data grids to move processing from the virtual layer to specialized execution engines  Holistic Operations Console • Common operations web console to orchestrate monitoring, notifications, diagnosis, auditing, migration, license management, etc. What’s cooking in the virtualization space
  • 31.
    31 Summary • HDE isinevitable in modern enterprises - Embrace the diversity. • Ensure your HDE evolution is driven by business goals • Virtualize Data, don’t Migrate or Consolidate It • Leverage Data Virtualization to understand, access, unify, govern, and model your data in a HDE.
  • 32.
    Thanks! www.denodo.com info@denodo.com © CopyrightDenodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.
  • 33.
  • 34.
    Who Are We Oneof the world's largest independent exploration and production companies. Committed to Health, Safety and Environment. Over 4,000 employees worldwide. Committed to its Core Values of: Integrity and Trust, Servant Leadership, People and Passion, Commercial Focus, Open Communication. An integral part of the communities where we live, work and operate. Recognized among the World's Most Innovative Companies by Forbes in 2012. 34
  • 35.
    Business Need Data -Access to Critical Information to Support Business Processes Better – access to complete information More – access to related information Faster – access in real-time Common Catalog – For enterprise 35
  • 36.
    Challenge Data is SiloedAcross Disparate Systems Manually access different systems Addressed with point-to-point data integration Takes too long to get answers to users Inadequate security on source systems 36
  • 37.
    Challenge Friction between Businessand IT IT is too slow. Takes too long to build solutions. Wrong Data – Obsolete or Stale Lack of adequate enterprise data repositories - DW / Data Mart / Data Lake 37
  • 38.
    Business Solutions Temporary Solutions/ Scalability Microsoft Access Microsoft Excel Spotfire 38
  • 39.
    Solution Data Abstraction Layer Abstractsaccess to disparate data sources Acts as a single repository (virtual) Makes data available in real-time to consumers Integration with AD – Security 39
  • 40.
    Data Virtualization –Our Journey Projects and Timeline Pilot Project: Jul’2016 - Sep’2016 Full Implementation in Business Unit: Oct’2016 - Feb’2017 Governance Implementation: Jun’2017 - Oct’2017 40
  • 41.
    Data Virtualization Our Architecture Hardware VirtualServer - Windows 2012 R2 Specs: 64-bit, 4GB - 18 GB Ram, 4 CPUs Denodo 6.0 Database Approach ADMIN CORE ASSET FUNCTIONAL 41
  • 42.
    Data Virtualization Use Case# 1 – Industry Subscription Data in the Cloud Problem Provide consistent and up-to-date access to purchased industry subscription data for data mining  Multiple vendors  Multiple data types (Well Locations, Oil and Gas Production Volumes, M&A Activity)  Multiple access protocols (Azure SQL database, hosted XML files, external JSONREST web services) Honor internal and external security requirements and ensure adequate performancecost  Prevent sharing usernames and passwords  Leverage (internal) enterprise security infrastructure  Provide metricsaudit on usage  Limit access as specified in agreements  Avoid time and cost of standing up additional databases Solution 42
  • 43.
    Data Virtualization Use Case# 2 – Logical Data Mart for Key Business Unit Problem Significant organization changes due to market conditions surfaced several point solutions driving critical business processes Reduce unnecessary copies from corporate data stores into local stores that stagnant quickly and are difficult to support (e.g. multiple, duplicated mini data marts in Excel and Access) Need ability to combine augmented or rapidly changing business unit specific data with corporate data Solution Leveraged newly formed data and analytics team in business unit(s) to provide centralized support Partnered with corporate teams to develop managed data delivery environment (tools + process) Built logical data mart (i.e. virtual database) to combine BU-specific and corporate data 43
  • 44.
    Data Virtualization Use Case# 3 – Streamline Well Summary and Production Data Retrieval Problem Needed to combine multiple data types (well header, production volumes, well spacing, forecast) from disparate systems Many manual processes used to update data set resulted in time-consuming process Reports ran very slowly Use of Spotfire for integration prevented reports from being run by other reporting tools. Solution Integrated data from disparate data sources into a few views Was able to integrate Excel workbooks into the solution 44
  • 45.
    Data Virtualization Our Observations BetterCollaboration between Business and IT Build solutions faster More involvement from Data Source owners 45
  • 46.
    Data Virtualization Our Observations Moreaccess to data  Ability to expose data in multiple ways (ODBC, JDBC, OData)  Combine data in new ways from different sources  Ability to access non-traditional data sources (e.g. SharePoint, web services, multi-dimensional)  Make the data sources all look like they reside in the same database Better access to data  Pick data from the best sources to incorporate into a mash-up view  Find source-of-record information in a central, documented location  Access by going directly to the source (instead of a copy) 46
  • 47.
    Producers View Designers  Techsand Power Users will be trained by IT and/or train the trainer approach  Techs and Power Users from each asset or functional area will build virtual views  Views will merge asset specific virtual views with global asset views  Techs are hands on daily with individual assets giving them a deeper understanding of what each asset needs IT  Train business users on use and best practices  Build global virtual views that can be used Consumers Data Catalog (TBD)  Web-based tool for viewing metadata  Ability to request access and connection info Applications Protocols  ODBC  JDBC 47 How are views to data created and accessed? ToadTM Data Point
  • 48.
    Recommendations - BaseCases for Use Combine data from multiple sources in real-time Source systems are highly available Access different types of data: structured (DBMS), semi-structured (XLS), unstructured (PDF, Web), web services Simple data cleansing and less complex transformations 48
  • 49.
    Key Discoveries May notduplicate all functionality in every client tool (Spotfire, Excel, Access) You are only as fast as your slowest data source Pass-thru security is difficult You can connect to almost anything (whether or not you should) Change Management is a challenge with the current version, 6.0 Involve source system owners in early stages ETL may be the best solution in some cases Integration with SAP BW - Possible but performance is a challenge Not intended for aggregations of large data sources in real-time 49
  • 50.
    What’s next? Business UnitRoll-out  Rolling out governance  Implementing metrics end of Sep’2017  Data Catalog Enterprise Roll-out  Planned for Oct’2017 50
  • 51.
  • 52.
    Infosys-Noah Consulting Industry Experience Information Managemen t Operations Focused Domain Expertise Average 25+years of industry experience in Information Management disciplines Library of project Accelerators honed to meet specific industry needs Extensive experience providing solutions to the largest and most complex companies in the world Specific Information Management Strategy and Implementation Methodologies Industry Thought Leadership in MDM, Data Quality, Metadata, Data Virtualization 52
  • 53.
    Why Consider DataVirtualization? Improve business agility Reduce latency Provide high quality, in context data End user self-service Ease of change Enable enterprise / cross-BU data integration Access immovable data Lower TCO 53 Leveraging DV to unlock information to accelerate and improve business performance.
  • 54.
    Data Virtualization usecases Page | 54 This is the primary Use Case, using DV to create a virtual data warehouse for reporting and analytics. Using DV to extend/upgrade existing EDWs would be a good way to expand on the value case of DV. DV and ETL can work together to create Virtual Data Marts on top of the existing/extended EDW platform. This use case is relevant when there is an existing trusted EDW system. DV can be used to create golden records on the fly implementing a Registry MDM. Note that MDM match and merge logic can sometimes be fairly complex and may not always be possible to implement using DV. 1. Reporting & Analytics 2. Extending EDW 4. Registry MDM 3. Virtual Data Marts
  • 55.
  • 56.
    Data virtualization doesn’tsolve data quality issuesApply strong data governance to support your DV approach  Manage data quality at the source  Data dictionary and definitions  Data Stewardship DV Standards  Library and naming standards  Virtualization layers 56 Image from the Data Management Book of Knowledge (DMBOK) published by Data Management International (DAMA)
  • 57.
    Where is mydata? Know what the authoritative source is for each attribute Understand the data lifecycle and how data can change across SOR Understand the quality of information within each SOR Ensure data standards are applied consistently across each SOR for shared data types Use of Data Catalog for easy access to information and metadata about DV Views
  • 58.
    Stick to yourDV principles Establish guiding principles on when to use DV versus other methods for data exchange Understand your non-functional requirements (latency, linage, performance) Performance considerations
  • 59.
    Adding complexity toequation Do not use for transformations beyond:  routing data  re-representing objects (i.e., renaming to standard model)  data augmentation (i.e., derived metrics) Minimize the urge to apply complex transformation or calculations 59 Extract Transform Load Data Data Data Analyti cs Busines s Applicati on Reporti ng
  • 60.
    General guiding principles DataVirtualization Layer * Source – Data Virtualization book App Data Stores Files Services Data WarehousesApplications BI, Reports Applications Portals, other.. ConsumersProducers • Use appliances to supercharge performance • Do not store any material business data • Perform aggregations in the data layer • Perform transformations in the ETL layer • Run data quality tools against the repository to validate/qualify data • Pass through I/U/D queries to the transaction source • Modeling should account for reuse and growth • Organization to support continuous expansion
  • 61.
    Any Questions? Thank you JamesSoos Associate Partner – E&P Practice Lead Mobile: 936-499-8441 James.Soos@noah-consulting.com James.Soos@infoys.com

Editor's Notes

  • #36 Well Header data, production data, drilling data, transactional data, well survey data. Drilling schedule data. Common catalog for data.
  • #38 Business feels that IT is too slow or takes too long.
  • #39 Superuser to the rescue. Different version of the same solution. Downturn caused key personnel to leave and we lost support of solutions.