Data Virtualization:
An Introduction
The world in abstract
Speaker
Paul Fearon
Senior Solutions Consultant
Agenda
1. Data challenges in the 21 century
2. What is Data Virtualization?
3. Benefits of Data Virtualization
4. How Data Virtualization Works
5. Key takeaways
6. Q&A
Challenges of Data Management
5
2020’s Data Facts
Rising Volume of data
▪ 90% of the data have been produced in the past 2 years
▪ 40 zettabytes of Data by end 2020 (5 200GB / person on earth)
▪ Every person will be generating 1.7 MB data / second In 2020
▪ It will take 181 million years for a person to download all those Data
Rising Business challenges with Data
▪ Poor data quality costs business between $9 M to $14 M a year
▪ Bad data is estimated to cost US only $3 trillion a year
▪ 97% of organization are investing in AI & Big Data
▪ 93% have multi-cloud & hybrid strategy
▪ Data Scientists waste 75% looking for Data
Sources: 2020, Capgemini, IBM, EDC …
6
• Social Media
• Mobile Devices
• Increased Internet
commerce/transac
tions
• Networked
devices/sensors
New sources of data New repositories
•Images
•Streamed
data
•Video/audio
•Parkay
New types
• Citizen analysts
• Customer
demands
• AI/ML
• Predictive
Analytics
• Data Science
Increase demand
• PPI
• Reporting needs
(AML/KYC, HEDIS,
etc.)
• GDPR
• HR, Privacy, Tax
Growing Regulatory concerns
• SaaS/PaaS
• Cloud based data
• Governance
challenges
• More apps
Increasingly complex
More Data More Insight
21st Century Data Challenges
•Data Lake
•Snowflake
•Queues
TIME & Cost
7
Information provision flow
Discovery/Requirements
Analysis
Prototype
Feedback
Develop
Test
Feedback
Release
Same old story
8
How do I handle data
Gartner – The Evolution of Analytical Environments
This is a Second Major Cycle of Analytical Consolidation
Operational Application
Operational Application
Operational Application
Operational Application
Operational Application
Operational Application
IoT Data
IoT Data
Other NewData
Other NewData
Operational
Application
Operational
Application
Operational
Application
Operational
Application
Cube
Cube
Operational
Application
Operational
Application
Cube
Cube
?
? Operational Application
Operational Application
Operational Application
Operational Application
Operational Application
Operational Application
IoT Data
IoT Data
Other NewData
Other NewData
1980s
1980s
Pre EDW
1990s
1990s
EDW
2010s
2010s
2000s
2000s
Post EDW
Time
LDW
Operational
Application
Operational
Application
Operational
Application
Operational
Application
Operational
Application
Operational
Application
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Lake
Data
Lake
?
?
LDW
LDW
Data Warehouse
Data Warehouse
Data Lake
Data Lake
Marts
Marts
ODS
ODS
Staging/Ingest
Staging/Ingest
Unified analysis
› Consolidated data
› "Collect the data"
› Single server, multiple nodes
› More analysis than any
one server can provide
©2018 Gartner, Inc.
Unified analysis
› Logically consolidated view of all data
› "Connect and collect"
› Multiple servers, of multiple nodes
› More analysis than any one system can provide
Fragmented/
nonexistent analysis
› Multiple sources
› Multiple structured sources
Fragmented analysis
› "Collect the data" (Into
› different repositories)
› New data types,
› processing, requirements
› Uncoordinated views
What is Data Virtualization..?
10
Source: “Gartner Market Guide for Data Virtualization, November 16, 2018”
Data virtualization can be used to create virtualized
and integrated views of data in-memory rather
than executing data movement and physically storing
integrated views in a target data structure. It
provides a layer of abstraction above the physical
implementation of data, to simplify query logic.
11
What is Data Virtualization?
Consume
in business applications
Combine
related data into views
Connect
to disparate data sources
2
3
1
DATA CONSUMERS
DISPARATE DATA SOURCES
Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word...
Analytical Operational
Less Structured
More Structured
CONNECT COMBINE PUBLISH
Multiple Protocols,
Formats
Query, Search,
Browse
Request/Reply,
Event Driven
Secure
Delivery
SQL,
MDX
Web
Services
Big Data
APIs
Web Automation
and Indexing
CONNECT COMBINE CONSUME
Share, Deliver,
Publish, Govern,
Collaborate
Discover, Transform,
Prepare, Improve
Quality, Integrate
Normalized views of
disparate data
12
Modern Data Virtualization
Data Virtualization enhanced with data management, automation and AI
 Delivers data more quickly than direct queries
 Leverages AI to accelerate performance and enhance the user experience
 An active data catalog to explore and govern data in real time
 Empowers data scientists with an integrated data science notebooks
 Flexible support for hybrid and multi-cloud architectures
 Employs automation to speed cloud deployment and management
 Leverages SSO and fine grain permissions to secure data assets
13
Six Essential Capabilities of Data Virtualization
4. Self-service data services
5. Centralized metadata, security &
governance
6. Location-agnostic architecture for
multi-cloud, hybrid acceleration
1. Data abstraction
2. Zero replication, zero relocation
3. Real-time information
14
1. Data abstraction
Abstracts access to disparate data sources.
Acts as a single virtual repository.
Abstracts data complexities like location,
format, protocols
…hides data complexity for ease of data access by business
Enterprise architects must revise their data architecture to meet
the demand for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research
15
2. Zero replication, zero relocation
…reduces development time and overall TCO
The Denodo Platform enables us to build and deliver data
services, to our internal and external consumers, within a
day instead of the 1 – 2 weeks it would take with ETL.”
– Manager, Enervus
Leaves the data at its source; extracts only what is
needed, on demand.
Diminishes the need for effort-intensive ETL
processes.
Eliminates unnecessary data redundancy.
16
3. Real-time information
Provisions data in real-time to consumers
Creates real-time logical views of data across many
data sources.
Supports transformations and quality functions
without the latency, redundancy, and rigidity of legacy
approaches
…enables timely decision-making
Denodo’s data fabric design relies on data virtualization
to provide integrated data quickly to business users to
effect faster outcomes..”
– Gartner Magic Quadrant for Data Integration Tools, 18 August’ 2020
17
4. Self-service data services
Facilitates access to all data, both internal and external
Enables creation of universal semantic models reflecting
business taxonomy
Connects data silos to provide best available information to
drive business decisions
…enables information discovery and self-service
Impressively quick turn around time to "unlock“ data from
additional siloes and from legacy systems - Few vendors (if any) can
compete with Denodo's support of the Restful/Odata standard -
both to provide data (northbound) and to access data from the
sources (southbound).”
– Business Analyst, Swiss Re
18
5. Centralized metadata, security & governance
Abstracts data source security models and enables single-point
security and governance.
Extends single-point control across cloud and on-premises
architectures
Provides multiple forms of metadata (technical, business,
operational) to facilitate understanding of data.
…simplifies data security, privacy, audit
Our Denodo rollout was one of the easiest and most successful rollouts of critical
enterprise software I have seen. It was successful in handling our initial, security,
use case immediately, and has since shown a strong ability to cover additional
use cases, in particular acting as a Data Abstraction Layer via it's web service
functionality.”
– Enterprise Architect, Asurion
19
6. Location-agnostic architecture for multi-cloud, hybrid acceleration
Optimizes costs by migrating data, applications, and analytics
workloads to cloud without impacting the business
Enables creation of hub architecture to support integration of
data across mixed workloads.
End-to-end management of migrations/promotions and
continuous delivery processes.
…enables cloud adoption
Impressively quick turn around time to "unlock“ data from
additional siloes and from legacy systems - Few vendors (if any) can
compete with Denodo's support of the Restful/Odata standard -
both to provide data (northbound) and to access data from the
sources (southbound).”
– Business Analyst, Swiss Re
20
Reference Architecture
IT: Flexible Source Architecture
Business: Flexible
Tool Choice
21
Reference Architecture
IT: Flexible Source Architecture
Business: Flexible
Tool Choice
Business can
now make
faster & more
sophisticated
decisions as
all data
accessible by
any tool of
choice
IT can now
move at a
cadence
that suits
speed w/o
affecting
business
Benefits – Why should I care
23
Benefits of Using Data Virtualization
Easier & faster access to trusted data
• For Business Users
• Simplicity: Users don’t need to navigate the complexity of the architecture. Where is
data (on-prem, cloud, multi-cloud)? How to Access it? Which location has priority?
• Agility: All data is securely delivered from a single (virtual) system
• Accessibility: Data is accessible in a variety of formats (SQL, REST, OData, GraphQL)
and in a web-based Data Catalog, regardless of original format and location
• Common Semantic Layer: All users see the same definitions and data, providing data
consistency
• Governed Self-Service: Users can use their own tools (BYOT) to access and query the
data that is governed, secure, and trusted data.
24
Benefits of Using Data Virtualization
Faster, cheaper, simpler, easier to secure and govern
• For IT
• Abstraction: Decouples storage and processing engines from the delivery of data
• Flexibility: Allows IT to change technologies and move data without service
interruptions
• Security: Centralized governance and security controls for all data assets
• Governance: The data accessed by the users can be governed, secured, and managed
so that users are accessing known, trusted, and approved data sets.
• Accelerated Delivery: As data is not be replicated to a staging area or data mart for
use, it is significantly quicker (up to 90% quicker) to deliver the data needed by the
users.
25
Data Virtualization use cases
From Data Storage & Management, to Data Consumers, going through Data Governance & Security
Decision
(Real time)
Single View
(Customer 360)
Agile BI
(Self-service)
Data Science
(ML & AI)
APPS
(Mobile & web)
Mergers &
Acquisitions
Data
Marketplace
Compliances
(IFRS17, GRC)
Data
Security
APIfication
(& SQLification)
Unified Data
Layer
Agility
& Simplicity
Real-time
Delivery
Data
Abstraction
Zero
Replication
Data
Governance
Sophisticated
Optimizations
Logical Data
Warehouse/Lake
Big Data
Fabric
Hybrid
Data Fabric
Data
Integration
Data
Migration
Refactoring &
Replatforming
Data Consumption
Data Storage & Management
Data Governance, Manipulation & Access
Sales
HR
Executive
Marketing
Apps/API
Data
Science
AI/ML
How Data Virtualization Works?
27
Denodo Platform 8.0 Architecture
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
DATA VIRTUALIZATION
CONNECT
to disparate data
in any location, format
or latency
COMBINE
related data into views
with universal semantic
model
CONSUME
using BI & data science
tools, data catalog,
and APIs
Self-Service
Self-Service
Hybrid/
Multi-Cloud
Hybrid/
Multi-Cloud
Data
Governance
Data
Governance
Query
Optimization
Query
Optimization
AI//ML
Recommendations
AI//ML
Recommendations
Security
Security
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Cloud
Stores
Hadoop
& NoSQL
Hadoop
& NoSQL OLAP
OLAP Files
Files Apps
Apps Streaming
Streaming SaaS
SaaS
1
2
3
4
5
6
7
8
9 10
11
28
Denodo Platform – How does virtualization work..?
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Cloud
Stores
Hadoop
& NoSQL
Hadoop
& NoSQL OLAP
OLAP Files
Files Apps
Apps Streaming
Streaming SaaS
SaaS
U
Customer 360
View
Virtual Data
Mart View
J
Unified
View
Unified
View
Unified
View
Unified
View
A
J
J
Derived
View
Derived
View
J
J
S
Transformation
& Cleansing
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
CONNECT
COMBINE
CONSUME
Key Takeaways
Reduce Complexity, Time and Money
Data architectures
are getting more
complex, more
diverse, and more
distributed
Traditional data
integration and
management
approaches are too
expensive, slow and
complex
Multiple Use Case Support
Enables a wide range
of use cases; from
self-service analytics
and data services to
centralized data
governance and
compliance,
innovation platform.
Abstraction helps
Presenting unified
business consumable
informational assets.
Separation of storage
and access.
Governance
Governed access
across silos of data.
Agility
Faster provisioning
of business
consumable data
sets. Rapid
prototyping
capabilities and
empowerment of
business users.
32
Thank you
34
Sources
1
• 150+ data adapters
• Relational, parallel, multidimensional, and in-
memory databases
• Cloud data warehouses
• NoSQL databases and Hadoop ecosystem
• SOAP/REST web services, SaaS applications
• Enterprise systems, web and file systems
• JMS queues and streaming technologies
• New adapters for Databricks Delta, Azure
Synapse, Google BigQuery and Cloud Data
Storages
DETAILS
• Agile connectivity to new data sources within minutes
• Broad range of source connectivity options
• Rapid integration of new sources
• Faster time to market
BENEFITS
SUMMARY
Industry’s broadest range of source connectivity options
USED BY
Data Engineers and Integrators
All available connectors are included within the Denodo Platform's cost”
– Gartner Magic Quadrant for Data Integration Tools, 2017
35
Data Governance
2
• Metadata repository with multiple visualization
options
• Discover, introspect and transform source
metadata
• Refresh or propagate source metadata when it
changes
• Data lineage, change impact and dependency
analysis
• Ability to integrate with third-party governance
tools and catalogs
DETAILS
• Delivery of consistent, curated and contextual
data to users
• Controlled data virtualization and enterprise data
services capabilities
BENEFITS
SUMMARY
Comprehensive data and metadata governance
USED BY
Data Engineers and Integrators
Data Stewards and Analysts
Using Data virtualization to Harden Business Users’ self-Authored BI Application”
– Forrester: Divide (BI Governance From Data Governance) And Conquer, 2017
36
Hybrid/Multi-Cloud
3
• Ready-to-use and available on AWS, Azure,
Google Marketplaces and Docker Container.
• Automated installation, configuration,
deployment and upgrade of clusters in hybrid
and multi-cloud environments
• A wide range of capacity options. Flexible rent-
by-the-hour options.
• Centralized Metadata management across
multiple locations.
• One Denodo instance can be a source for
another Denodo instance. Enables convenient,
layered, regional architectures.
• Orchestrate, Audit, Monitor, and Govern
DETAILS
• Deploy at any location - on-premises, multi-
cloud, and edge. Multi-location architecture for
Maximum Flexibility.
• Minimize expensive data movement and
maximize local processing.
• Optimize costs by migrating data, applications,
and analytics workloads to cloud without
impacting the business.
• Full integrated Diagnostic and Monitoring Tool
with Solutions Manager, making it easy to
manage clusters
BENEFITS
SUMMARY
Multi-location architecture for multi-cloud, hybrid, and edge scenarios with automated infrastructure management capability
USED BY
Cloud-first Enterprises
37
Self-Service
4
• Linked data services for self-service data
discovery, browsing and exploration
• Users can drill down in data views to examine the
data itself
• Denodo data catalog for self-service global data
search, relevant to the user
• Users can find, share and reuse all datasets
available through the data virtualization layer
DETAILS
• Easy for business users to create a catalog of
business views and classify them according to
business categories
• Business users and LOB executives become less
dependent on IT organization
• Denodo data catalog empowers a community of
analysts and decision makers by creating a digital
marketplace
BENEFITS
SUMMARY
Easy data exploration and discovery by business users in a self-serviceable manner
USED BY
Data Engineers and Integrators
Application & API Integrators
Finding the right data quickly is essential in the age of self-service analytics”
– Dave Wells, Eckerson Group, 2017
38
Security
5
• Role-based access control to data services,
sources, and enterprise tools
• Single sign-on using Kerberos; Security
delegation; SAML, OAuth Support.
• Row and column level fine-grained authorization
• User authentication using LDAP, Active Directory
• Data Encryption, Masking, Tokenization and
Redaction for data privacy
DETAILS
• Easy to enforce security and policies in one central
place, the data virtualization layer
• Consistent security model for all sources and all
applications
• Secures both data and metadata.
BENEFITS
SUMMARY
All-encompassing unified security layer for data delivery
USED BY
Data Engineers and Integrators
Application & API Integrators
Denodo’s data fabric solution integrates key data management components, including data integration, data ingestion, data transformation, data
governance and security, to support new and emerging use cases including customers 360, real-time and on-demand analytics, IoT analytics, and self-service
analytics.”
– The Forrester Wave™: Enterprise Data Fabric, Q2 2020
39
AI/ML Recommendation
6
• Past activity metadata-based ML process to
automate fabric management activities
• Denodo uses ML to automatically propose and
choose the best summaries for faster processing
• ML process to predict workload peaks for
Denodo in the cloud and auto-scale accordingly
• ML-based recommendations of similar datasets,
and datasets interesting for similar users
DETAILS
• Significant reduction in data search and discovery
time
• Accelerate advanced analytics and data science
• Cost reduction in Denodo cloud usage
BENEFITS
SUMMARY
Automate data fabric management and processes using ML
USED BY
Citizen Analysts and Integrators
Denodo’s AI/ML capabilities, as well as automation, continue to enhance its capabilities across data fabric components!”
– The Forrester Wave™: Enterprise Data Fabric, Q2 2020
40
Query Optimization
7
• Dynamic Query Optimizer for best-in-class
optimization
• Smart query acceleration using Summaries for
complex analytical scenarios
• Offers partial and full aggregation pushdown
• Native integration with existing MPP and in-
memory systems for query acceleration
• Widest range of caching configuration options -
partial (for frequently used reports) or full cache
(for data intensive analytical applications)
DETAILS
• Significant reduction in query execution time
• Significant reduction of network traffic
• Exploits the processing power of data sources to
maximize local processing
• Cost reduction in cloud use cases
• “Full cache mode” avoids accessing data sources
BENEFITS
SUMMARY
Unparalleled performance through query optimization and caching
USED BY
Data Engineers and Integrators
Caching mechanisms range from proprietary file structures to standard relational database management system (RDBMS) tables, and also include caching of
data in-memory. These caching mechanisms enhance the performance of data virtualization. Full and partial refreshing of caching can be triggered by
schedules, events or rules.”
– Adopt Data Virtualization to Improve Agility and Bimodal Traits in Your Aging Data Integration, 2017
41
Data Catalog
8
• Google-like search capability for data and
metadata
• Business-friendly new UI geared to roles such as
data stewards, data analysts, and citizen analysts
• Ability to create business categories or tags
• Graphical representation of lineage, relationships
• Usage-based metadata – who, when, what, why,
and how of data consumption
• Enhanced collaboration features through user
warnings and comments
• Machine learning powered personalized
recommendation for data sets
DETAILS
• Data at the speed of business
• Users can easily search for data or metadata
• Facilitates Sharing and Collaboration
• Enhanced user experience with smart ranking of
search results
BENEFITS
SUMMARY
The only data virtualization solution that seamlessly integrates data catalog with data delivery.
USED BY
Data Stewards and Analysts
Citizen Integrators
Citizen Analysts
Through 2022, over 80% data lake projects will fail to deliver value as finding, inventorying and curating data will prove to be the biggest inhibitor to
analytics and data science success.”
- Gartner Augmented Data Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders, September 12, 2019
42
Data Consumers
9
• Multiprotocol support including JDBC, ODBC,
Odata, GraphQL and GeoJSON
• SOAP and RESTful web services
• Output in XML, JSON and HTML for human and
machine consumption
• Portal widget for major portal support
• Native Denodo connector for major BI tools such
as Tableau, Microstrategy, Cognos, and PowerBI
DETAILS
• Empower business users with relevant and
contextual data at their fingertips
• Single source of truth across multiple BI tools
• Consistent security and governance across all
consuming applications
• Reduced API call overhead through GraphQL
BENEFITS
SUMMARY
Consistent view of information across any consuming application; Rationalize and Integrate multiple BI platforms.
USED BY
Citizen Integrators
Citizen Analysts
43
Data Science Tools
10
• Data scientists can combine queries, scripts,
graphics and text to create narratives
• Denodo Data Science Tool is built based on
Apache Zeppelin
• Denodo users can create, save, and share their
own notebooks with fellow users
• Fully integrated with Denodo’s security system
and SSO capabilities
DETAILS
• Help data scientists save time in finding data for
analytics and model building
• Data scientists can easily share their findings with
peers using the notebook dashboard
• Contextualizing data science models and
consumption is easier through Denodo layer
BENEFITS
SUMMARY
Data Science Notebook that is fully integrated with Denodo’s security system and SSO capabilities
USED BY
Data Scientists and Data Engineers
44
Data as a Service
• Expose reusable Data Services supporting
multiple protocols (JDBC, ODBC, ADO.NET, REST
and SOAP/XML Web Services, OData, GraphQL)
• Easily extend or specialize Data Services for
specific use cases
• Full metadata introspection support
• OpenAPI (Swagger) support
• Data Lineage support
DETAILS
• Data at the speed of business
• Users can easily search for data or metadata
• Facilitates Sharing and Collaboration
BENEFITS
SUMMARY
Create reusable, extensible Data Services for all types of consumers
USED BY
Data Engineers and Integrators
Application and API Integrators
11

Data Virtualization: An Introduction

  • 1.
  • 2.
  • 3.
    Agenda 1. Data challengesin the 21 century 2. What is Data Virtualization? 3. Benefits of Data Virtualization 4. How Data Virtualization Works 5. Key takeaways 6. Q&A
  • 4.
  • 5.
    5 2020’s Data Facts RisingVolume of data ▪ 90% of the data have been produced in the past 2 years ▪ 40 zettabytes of Data by end 2020 (5 200GB / person on earth) ▪ Every person will be generating 1.7 MB data / second In 2020 ▪ It will take 181 million years for a person to download all those Data Rising Business challenges with Data ▪ Poor data quality costs business between $9 M to $14 M a year ▪ Bad data is estimated to cost US only $3 trillion a year ▪ 97% of organization are investing in AI & Big Data ▪ 93% have multi-cloud & hybrid strategy ▪ Data Scientists waste 75% looking for Data Sources: 2020, Capgemini, IBM, EDC …
  • 6.
    6 • Social Media •Mobile Devices • Increased Internet commerce/transac tions • Networked devices/sensors New sources of data New repositories •Images •Streamed data •Video/audio •Parkay New types • Citizen analysts • Customer demands • AI/ML • Predictive Analytics • Data Science Increase demand • PPI • Reporting needs (AML/KYC, HEDIS, etc.) • GDPR • HR, Privacy, Tax Growing Regulatory concerns • SaaS/PaaS • Cloud based data • Governance challenges • More apps Increasingly complex More Data More Insight 21st Century Data Challenges •Data Lake •Snowflake •Queues TIME & Cost
  • 7.
  • 8.
    8 How do Ihandle data Gartner – The Evolution of Analytical Environments This is a Second Major Cycle of Analytical Consolidation Operational Application Operational Application Operational Application Operational Application Operational Application Operational Application IoT Data IoT Data Other NewData Other NewData Operational Application Operational Application Operational Application Operational Application Cube Cube Operational Application Operational Application Cube Cube ? ? Operational Application Operational Application Operational Application Operational Application Operational Application Operational Application IoT Data IoT Data Other NewData Other NewData 1980s 1980s Pre EDW 1990s 1990s EDW 2010s 2010s 2000s 2000s Post EDW Time LDW Operational Application Operational Application Operational Application Operational Application Operational Application Operational Application Data Warehouse Data Warehouse Data Warehouse Data Warehouse Data Lake Data Lake ? ? LDW LDW Data Warehouse Data Warehouse Data Lake Data Lake Marts Marts ODS ODS Staging/Ingest Staging/Ingest Unified analysis › Consolidated data › "Collect the data" › Single server, multiple nodes › More analysis than any one server can provide ©2018 Gartner, Inc. Unified analysis › Logically consolidated view of all data › "Connect and collect" › Multiple servers, of multiple nodes › More analysis than any one system can provide Fragmented/ nonexistent analysis › Multiple sources › Multiple structured sources Fragmented analysis › "Collect the data" (Into › different repositories) › New data types, › processing, requirements › Uncoordinated views
  • 9.
    What is DataVirtualization..?
  • 10.
    10 Source: “Gartner MarketGuide for Data Virtualization, November 16, 2018” Data virtualization can be used to create virtualized and integrated views of data in-memory rather than executing data movement and physically storing integrated views in a target data structure. It provides a layer of abstraction above the physical implementation of data, to simplify query logic.
  • 11.
    11 What is DataVirtualization? Consume in business applications Combine related data into views Connect to disparate data sources 2 3 1 DATA CONSUMERS DISPARATE DATA SOURCES Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word... Analytical Operational Less Structured More Structured CONNECT COMBINE PUBLISH Multiple Protocols, Formats Query, Search, Browse Request/Reply, Event Driven Secure Delivery SQL, MDX Web Services Big Data APIs Web Automation and Indexing CONNECT COMBINE CONSUME Share, Deliver, Publish, Govern, Collaborate Discover, Transform, Prepare, Improve Quality, Integrate Normalized views of disparate data
  • 12.
    12 Modern Data Virtualization DataVirtualization enhanced with data management, automation and AI  Delivers data more quickly than direct queries  Leverages AI to accelerate performance and enhance the user experience  An active data catalog to explore and govern data in real time  Empowers data scientists with an integrated data science notebooks  Flexible support for hybrid and multi-cloud architectures  Employs automation to speed cloud deployment and management  Leverages SSO and fine grain permissions to secure data assets
  • 13.
    13 Six Essential Capabilitiesof Data Virtualization 4. Self-service data services 5. Centralized metadata, security & governance 6. Location-agnostic architecture for multi-cloud, hybrid acceleration 1. Data abstraction 2. Zero replication, zero relocation 3. Real-time information
  • 14.
    14 1. Data abstraction Abstractsaccess to disparate data sources. Acts as a single virtual repository. Abstracts data complexities like location, format, protocols …hides data complexity for ease of data access by business Enterprise architects must revise their data architecture to meet the demand for fast data.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research
  • 15.
    15 2. Zero replication,zero relocation …reduces development time and overall TCO The Denodo Platform enables us to build and deliver data services, to our internal and external consumers, within a day instead of the 1 – 2 weeks it would take with ETL.” – Manager, Enervus Leaves the data at its source; extracts only what is needed, on demand. Diminishes the need for effort-intensive ETL processes. Eliminates unnecessary data redundancy.
  • 16.
    16 3. Real-time information Provisionsdata in real-time to consumers Creates real-time logical views of data across many data sources. Supports transformations and quality functions without the latency, redundancy, and rigidity of legacy approaches …enables timely decision-making Denodo’s data fabric design relies on data virtualization to provide integrated data quickly to business users to effect faster outcomes..” – Gartner Magic Quadrant for Data Integration Tools, 18 August’ 2020
  • 17.
    17 4. Self-service dataservices Facilitates access to all data, both internal and external Enables creation of universal semantic models reflecting business taxonomy Connects data silos to provide best available information to drive business decisions …enables information discovery and self-service Impressively quick turn around time to "unlock“ data from additional siloes and from legacy systems - Few vendors (if any) can compete with Denodo's support of the Restful/Odata standard - both to provide data (northbound) and to access data from the sources (southbound).” – Business Analyst, Swiss Re
  • 18.
    18 5. Centralized metadata,security & governance Abstracts data source security models and enables single-point security and governance. Extends single-point control across cloud and on-premises architectures Provides multiple forms of metadata (technical, business, operational) to facilitate understanding of data. …simplifies data security, privacy, audit Our Denodo rollout was one of the easiest and most successful rollouts of critical enterprise software I have seen. It was successful in handling our initial, security, use case immediately, and has since shown a strong ability to cover additional use cases, in particular acting as a Data Abstraction Layer via it's web service functionality.” – Enterprise Architect, Asurion
  • 19.
    19 6. Location-agnostic architecturefor multi-cloud, hybrid acceleration Optimizes costs by migrating data, applications, and analytics workloads to cloud without impacting the business Enables creation of hub architecture to support integration of data across mixed workloads. End-to-end management of migrations/promotions and continuous delivery processes. …enables cloud adoption Impressively quick turn around time to "unlock“ data from additional siloes and from legacy systems - Few vendors (if any) can compete with Denodo's support of the Restful/Odata standard - both to provide data (northbound) and to access data from the sources (southbound).” – Business Analyst, Swiss Re
  • 20.
    20 Reference Architecture IT: FlexibleSource Architecture Business: Flexible Tool Choice
  • 21.
    21 Reference Architecture IT: FlexibleSource Architecture Business: Flexible Tool Choice Business can now make faster & more sophisticated decisions as all data accessible by any tool of choice IT can now move at a cadence that suits speed w/o affecting business
  • 22.
    Benefits – Whyshould I care
  • 23.
    23 Benefits of UsingData Virtualization Easier & faster access to trusted data • For Business Users • Simplicity: Users don’t need to navigate the complexity of the architecture. Where is data (on-prem, cloud, multi-cloud)? How to Access it? Which location has priority? • Agility: All data is securely delivered from a single (virtual) system • Accessibility: Data is accessible in a variety of formats (SQL, REST, OData, GraphQL) and in a web-based Data Catalog, regardless of original format and location • Common Semantic Layer: All users see the same definitions and data, providing data consistency • Governed Self-Service: Users can use their own tools (BYOT) to access and query the data that is governed, secure, and trusted data.
  • 24.
    24 Benefits of UsingData Virtualization Faster, cheaper, simpler, easier to secure and govern • For IT • Abstraction: Decouples storage and processing engines from the delivery of data • Flexibility: Allows IT to change technologies and move data without service interruptions • Security: Centralized governance and security controls for all data assets • Governance: The data accessed by the users can be governed, secured, and managed so that users are accessing known, trusted, and approved data sets. • Accelerated Delivery: As data is not be replicated to a staging area or data mart for use, it is significantly quicker (up to 90% quicker) to deliver the data needed by the users.
  • 25.
    25 Data Virtualization usecases From Data Storage & Management, to Data Consumers, going through Data Governance & Security Decision (Real time) Single View (Customer 360) Agile BI (Self-service) Data Science (ML & AI) APPS (Mobile & web) Mergers & Acquisitions Data Marketplace Compliances (IFRS17, GRC) Data Security APIfication (& SQLification) Unified Data Layer Agility & Simplicity Real-time Delivery Data Abstraction Zero Replication Data Governance Sophisticated Optimizations Logical Data Warehouse/Lake Big Data Fabric Hybrid Data Fabric Data Integration Data Migration Refactoring & Replatforming Data Consumption Data Storage & Management Data Governance, Manipulation & Access Sales HR Executive Marketing Apps/API Data Science AI/ML
  • 26.
  • 27.
    27 Denodo Platform 8.0Architecture DATA CATALOG Discover - Explore - Document DATA AS A SERVICE RESTful / OData GraphQL / GeoJSON BI Tools Data Science Tools SQL CONSUMERS DATA VIRTUALIZATION CONNECT to disparate data in any location, format or latency COMBINE related data into views with universal semantic model CONSUME using BI & data science tools, data catalog, and APIs Self-Service Self-Service Hybrid/ Multi-Cloud Hybrid/ Multi-Cloud Data Governance Data Governance Query Optimization Query Optimization AI//ML Recommendations AI//ML Recommendations Security Security LOGICAL DATA FABRIC SOURCES Traditional DB & DW Traditional DB & DW 150+ data adapters Cloud Stores Cloud Stores Hadoop & NoSQL Hadoop & NoSQL OLAP OLAP Files Files Apps Apps Streaming Streaming SaaS SaaS 1 2 3 4 5 6 7 8 9 10 11
  • 28.
    28 Denodo Platform –How does virtualization work..? DATA CATALOG Discover - Explore - Document DATA AS A SERVICE RESTful / OData GraphQL / GeoJSON BI Tools Data Science Tools SQL CONSUMERS LOGICAL DATA FABRIC SOURCES Traditional DB & DW Traditional DB & DW 150+ data adapters Cloud Stores Cloud Stores Hadoop & NoSQL Hadoop & NoSQL OLAP OLAP Files Files Apps Apps Streaming Streaming SaaS SaaS U Customer 360 View Virtual Data Mart View J Unified View Unified View Unified View Unified View A J J Derived View Derived View J J S Transformation & Cleansing Base View Base View Base View Base View Base View Base View Base View Abstraction CONNECT COMBINE CONSUME
  • 29.
  • 30.
    Reduce Complexity, Timeand Money Data architectures are getting more complex, more diverse, and more distributed Traditional data integration and management approaches are too expensive, slow and complex Multiple Use Case Support Enables a wide range of use cases; from self-service analytics and data services to centralized data governance and compliance, innovation platform. Abstraction helps Presenting unified business consumable informational assets. Separation of storage and access. Governance Governed access across silos of data. Agility Faster provisioning of business consumable data sets. Rapid prototyping capabilities and empowerment of business users.
  • 32.
  • 33.
  • 34.
    34 Sources 1 • 150+ dataadapters • Relational, parallel, multidimensional, and in- memory databases • Cloud data warehouses • NoSQL databases and Hadoop ecosystem • SOAP/REST web services, SaaS applications • Enterprise systems, web and file systems • JMS queues and streaming technologies • New adapters for Databricks Delta, Azure Synapse, Google BigQuery and Cloud Data Storages DETAILS • Agile connectivity to new data sources within minutes • Broad range of source connectivity options • Rapid integration of new sources • Faster time to market BENEFITS SUMMARY Industry’s broadest range of source connectivity options USED BY Data Engineers and Integrators All available connectors are included within the Denodo Platform's cost” – Gartner Magic Quadrant for Data Integration Tools, 2017
  • 35.
    35 Data Governance 2 • Metadatarepository with multiple visualization options • Discover, introspect and transform source metadata • Refresh or propagate source metadata when it changes • Data lineage, change impact and dependency analysis • Ability to integrate with third-party governance tools and catalogs DETAILS • Delivery of consistent, curated and contextual data to users • Controlled data virtualization and enterprise data services capabilities BENEFITS SUMMARY Comprehensive data and metadata governance USED BY Data Engineers and Integrators Data Stewards and Analysts Using Data virtualization to Harden Business Users’ self-Authored BI Application” – Forrester: Divide (BI Governance From Data Governance) And Conquer, 2017
  • 36.
    36 Hybrid/Multi-Cloud 3 • Ready-to-use andavailable on AWS, Azure, Google Marketplaces and Docker Container. • Automated installation, configuration, deployment and upgrade of clusters in hybrid and multi-cloud environments • A wide range of capacity options. Flexible rent- by-the-hour options. • Centralized Metadata management across multiple locations. • One Denodo instance can be a source for another Denodo instance. Enables convenient, layered, regional architectures. • Orchestrate, Audit, Monitor, and Govern DETAILS • Deploy at any location - on-premises, multi- cloud, and edge. Multi-location architecture for Maximum Flexibility. • Minimize expensive data movement and maximize local processing. • Optimize costs by migrating data, applications, and analytics workloads to cloud without impacting the business. • Full integrated Diagnostic and Monitoring Tool with Solutions Manager, making it easy to manage clusters BENEFITS SUMMARY Multi-location architecture for multi-cloud, hybrid, and edge scenarios with automated infrastructure management capability USED BY Cloud-first Enterprises
  • 37.
    37 Self-Service 4 • Linked dataservices for self-service data discovery, browsing and exploration • Users can drill down in data views to examine the data itself • Denodo data catalog for self-service global data search, relevant to the user • Users can find, share and reuse all datasets available through the data virtualization layer DETAILS • Easy for business users to create a catalog of business views and classify them according to business categories • Business users and LOB executives become less dependent on IT organization • Denodo data catalog empowers a community of analysts and decision makers by creating a digital marketplace BENEFITS SUMMARY Easy data exploration and discovery by business users in a self-serviceable manner USED BY Data Engineers and Integrators Application & API Integrators Finding the right data quickly is essential in the age of self-service analytics” – Dave Wells, Eckerson Group, 2017
  • 38.
    38 Security 5 • Role-based accesscontrol to data services, sources, and enterprise tools • Single sign-on using Kerberos; Security delegation; SAML, OAuth Support. • Row and column level fine-grained authorization • User authentication using LDAP, Active Directory • Data Encryption, Masking, Tokenization and Redaction for data privacy DETAILS • Easy to enforce security and policies in one central place, the data virtualization layer • Consistent security model for all sources and all applications • Secures both data and metadata. BENEFITS SUMMARY All-encompassing unified security layer for data delivery USED BY Data Engineers and Integrators Application & API Integrators Denodo’s data fabric solution integrates key data management components, including data integration, data ingestion, data transformation, data governance and security, to support new and emerging use cases including customers 360, real-time and on-demand analytics, IoT analytics, and self-service analytics.” – The Forrester Wave™: Enterprise Data Fabric, Q2 2020
  • 39.
    39 AI/ML Recommendation 6 • Pastactivity metadata-based ML process to automate fabric management activities • Denodo uses ML to automatically propose and choose the best summaries for faster processing • ML process to predict workload peaks for Denodo in the cloud and auto-scale accordingly • ML-based recommendations of similar datasets, and datasets interesting for similar users DETAILS • Significant reduction in data search and discovery time • Accelerate advanced analytics and data science • Cost reduction in Denodo cloud usage BENEFITS SUMMARY Automate data fabric management and processes using ML USED BY Citizen Analysts and Integrators Denodo’s AI/ML capabilities, as well as automation, continue to enhance its capabilities across data fabric components!” – The Forrester Wave™: Enterprise Data Fabric, Q2 2020
  • 40.
    40 Query Optimization 7 • DynamicQuery Optimizer for best-in-class optimization • Smart query acceleration using Summaries for complex analytical scenarios • Offers partial and full aggregation pushdown • Native integration with existing MPP and in- memory systems for query acceleration • Widest range of caching configuration options - partial (for frequently used reports) or full cache (for data intensive analytical applications) DETAILS • Significant reduction in query execution time • Significant reduction of network traffic • Exploits the processing power of data sources to maximize local processing • Cost reduction in cloud use cases • “Full cache mode” avoids accessing data sources BENEFITS SUMMARY Unparalleled performance through query optimization and caching USED BY Data Engineers and Integrators Caching mechanisms range from proprietary file structures to standard relational database management system (RDBMS) tables, and also include caching of data in-memory. These caching mechanisms enhance the performance of data virtualization. Full and partial refreshing of caching can be triggered by schedules, events or rules.” – Adopt Data Virtualization to Improve Agility and Bimodal Traits in Your Aging Data Integration, 2017
  • 41.
    41 Data Catalog 8 • Google-likesearch capability for data and metadata • Business-friendly new UI geared to roles such as data stewards, data analysts, and citizen analysts • Ability to create business categories or tags • Graphical representation of lineage, relationships • Usage-based metadata – who, when, what, why, and how of data consumption • Enhanced collaboration features through user warnings and comments • Machine learning powered personalized recommendation for data sets DETAILS • Data at the speed of business • Users can easily search for data or metadata • Facilitates Sharing and Collaboration • Enhanced user experience with smart ranking of search results BENEFITS SUMMARY The only data virtualization solution that seamlessly integrates data catalog with data delivery. USED BY Data Stewards and Analysts Citizen Integrators Citizen Analysts Through 2022, over 80% data lake projects will fail to deliver value as finding, inventorying and curating data will prove to be the biggest inhibitor to analytics and data science success.” - Gartner Augmented Data Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders, September 12, 2019
  • 42.
    42 Data Consumers 9 • Multiprotocolsupport including JDBC, ODBC, Odata, GraphQL and GeoJSON • SOAP and RESTful web services • Output in XML, JSON and HTML for human and machine consumption • Portal widget for major portal support • Native Denodo connector for major BI tools such as Tableau, Microstrategy, Cognos, and PowerBI DETAILS • Empower business users with relevant and contextual data at their fingertips • Single source of truth across multiple BI tools • Consistent security and governance across all consuming applications • Reduced API call overhead through GraphQL BENEFITS SUMMARY Consistent view of information across any consuming application; Rationalize and Integrate multiple BI platforms. USED BY Citizen Integrators Citizen Analysts
  • 43.
    43 Data Science Tools 10 •Data scientists can combine queries, scripts, graphics and text to create narratives • Denodo Data Science Tool is built based on Apache Zeppelin • Denodo users can create, save, and share their own notebooks with fellow users • Fully integrated with Denodo’s security system and SSO capabilities DETAILS • Help data scientists save time in finding data for analytics and model building • Data scientists can easily share their findings with peers using the notebook dashboard • Contextualizing data science models and consumption is easier through Denodo layer BENEFITS SUMMARY Data Science Notebook that is fully integrated with Denodo’s security system and SSO capabilities USED BY Data Scientists and Data Engineers
  • 44.
    44 Data as aService • Expose reusable Data Services supporting multiple protocols (JDBC, ODBC, ADO.NET, REST and SOAP/XML Web Services, OData, GraphQL) • Easily extend or specialize Data Services for specific use cases • Full metadata introspection support • OpenAPI (Swagger) support • Data Lineage support DETAILS • Data at the speed of business • Users can easily search for data or metadata • Facilitates Sharing and Collaboration BENEFITS SUMMARY Create reusable, extensible Data Services for all types of consumers USED BY Data Engineers and Integrators Application and API Integrators 11