DATA VIRTUALIZATION
APAC WEBINAR SERIES
Sessions Covering Key Data
Integration Challenges Solved
with Data Virtualization
Simplifying Your Cloud Architecture
with a Logical Data Fabric
Katrina Briedes
APAC Sales Engineering, Denodo
Sushant Kumar
Product Marketing Manager, Denodo
Agenda
1. What is a Data Fabric
2. Cloud Migration Choices
3. Customer story
4. Product Demo
5. Q&A
6. Next Steps
4
A data fabric is an architecture pattern that informs and automates the design,
integration and deployment of data objects regardless of deployment platforms and
architectural approaches.
It utilizes continuous analytics and AI/ML over all metadata assets to provide actionable
insights and recommendations on data management and integration design and
deployment patterns.
This results in faster, informed and, in some cases, completely automated data access
and sharing.
Data Fabric Definition
5
6
Pictorial View of a Data Fabric – from Gartner
Data Fabric Net
Compounds Customers Products Claims
RDBMS/OLTP
Flat Files
Legacy
Third Party
Traditional Analytics/BI
Data Warehouse
ETL ETL
Mart Mart
Data Lakes Cloud Data Stores Apps andDocument
Repositories
XML • JSON • PDF
DOC • WEB
- Forrester Research, June 2020
“Dynamically orchestrating disparate data sources intelligently and
securely in a self-service manner and leveraging various data platforms to
deliver integrated and trusted data to support various applications,
analytics, and use cases”
Data Fabric Definition
7
8
Data management
Metadata/catalog
Data security
Data governance
Data processing
Data quality
Data lineage
Global distributed platform, in-memory, embedded,
self-service, and APIs
AI/ML
Global data access
Data modeling, preparation, curation, and graph engine
AI/ML
Data discovery
Transformation, integration, and cleansing
AI/ML
Data orchestration
Data platform —
processing
Data processing/
persistence
Hadoop
NoSQL
Spark
Policies
Ingestion, streaming, and data movement Data ingestion/streaming
AI/ML AI/ML
On-premises
Cloud Data sources
Data lake
EDW/BDW
AI/ML
Forrester Data Fabric Architecture
9
Data management
Metadata/catalog
Data security
Data governance
Data processing
Data quality
Data lineage
Global distributed platform, in-memory, embedded,
self-service, and APIs
AI/ML
Global data access
Data modeling, preparation, curation, and graph engine
AI/ML
Data discovery
Transformation, integration, and cleansing
AI/ML
Data orchestration
Data platform —
processing
Data processing/
persistence
Hadoop
NoSQL
Spark
Data lake
EDW/BDW
AI/ML
Policies
Ingestion, streaming, and data movement Data ingestion/streaming
AI/ML AI/ML
On-premises
Cloud Data sources
The Logical Data Fabric Architecture
10
• Data Abstraction: decoupling
applications/data usage from data
sources
• Data Integration without replication
or relocation of physical data
• Easy Access to Any Data, high
performant and real-time/ right-
time
• Data Catalog for self-service data
services and easy discovery
• Unified metadata, security &
governance across all data assets
• Data Delivery in any format with
intelligent query optimization that
leverages new and existing
physical data platforms
A logical data layer – a “logical data fabric” – that provides high-performant, real-time, and secure
access to integrated business views of disparate data across the enterprise
Data Virtualization: Logical Data Fabric
11
Stages of a Cloud Journey
All systems are on-premise.
Using traditionaldatabases,
etc. – maybe an on-premise
Hadoop cluster. Lots of ETL
pipelines. Using Denodofor
integrated view of data.
Systems are now on-premise and in the Cloud –
initially hosted by the preferred Cloud provider. The
data is balanced across the different environments
although the bulk of the data is initially on-premise.
ETL-style data movement is often used to move data
from on-premise systems to Cloud-based analytical
systems. The systems are more complex and users
need to be able to find and access data from on-
premise and Cloud locations.
In reality, this is a hybrid/multi-Cloud environment, with
systems in multiple Clouds (AWS, Azure, GCP, Salesforce,
etc.) and a few legacy systems still on-premise. The
environment is even more complex as workloads can
move between Cloud providers to take advantage of new
capabilities, cost optimization, etc. Users still need to find
and access data in this environment.
System modernization initiatives move applications and
data to the Cloud. For critical systems, this migration is
typically a phased approach over a period of months(or
years).
On-
Premise
Transition
to Cloud
Hybrid
Single
Cloud
Multi-
Cloud
(Note: Most organizations skip this stage and go straight to
multi-Cloud)
Systems have moved to the Cloud (although some systems
are still on-premise and cannot be moved to the Cloud).
The ‘center of gravity’ for data is solidly in the Cloud. More
processing and data integration occurs in the Cloud. Data is
moved from on-premise systems to the Cloud using ETL.
User data access is predominantly from Cloud systems.
12
Cloud Migrations Options
• Re-Host – ‘Lift and Shift’ – Take existing data and copy it to Cloud “as is” into same
database
• Good for smaller data sets or data sets with low importance
• Re-Platform – Relocate to new database running on Cloud – everything else stays
the same
• e.g. move from Oracle 12g to Snowflake
• Re-Factor/Re-Architect – Move to a different database *and* change the data
schema
• e.g. move from Oracle to Redshift and re-factor data model, partitioning, etc.
13
Cloud Migrations Options
14
Cloud Migration Using Data Virtualization
• Large or critical Cloud migrations are risky
• Big Bang approach is not advised
• Phased approach is recommended
• Select data set to migrate, copy to Cloud
• Test and tune data access, then go live
• Repeat for next data set and so on
• Use Denodo as abstraction layer during
migration process
• Isolate users from shift of data
15
Hybrid Data Integration with a Logical Data Fabric
Common access point for both on-premise
and cloud sources
• Access to all sources as a single
schema
with no replication: Virtual data lake
• Enables combination of data
across
sources, regardless of nature and
location
• Allows definition of common
semantic
model
• Single security model and single
Active
Directory
Data Center
Cloud
16
Multi-Cloud Integration with Logical Data Fabric
Amazon RDS,
Aurora
US East
AvailabilityZone
EMEA
AvailabilityZone
On-prem
data center
17
BHP Builds a Logical Data Fabric Using Data Virtualization
BHP wanted to manage business risk by integrating data systems across
multiple geographies. But this was a time consuming and expensive operation.
BHP’s global application landscape provides limited and restricted reusability of existing
data platforms which lead to:
• Repeated engineering effort to access the same data sources for different data
solutions
• Long lead times to ingest or load data before a data solu on can be developed
• Project-centric data repositories are created to provide a consolidated set of data for
a specific purpose, increasing total cost of ownership, complexity and variability in
data interpretation
BHP is among the world's
top producers of major
commodities including iron
ore, coal and copper. They
have a global presence with
operations and offices
across Australia, Asia, UK,
Canada, USA and central
and south America.
18
Reference Architecture
Data Source
 Application data stores
 SaaS / Cloud Applications
 Application interfaces
 Manual data sources
Data Virtualization Platform Consumers
 Enterprise &
Regional Data
Stores
Self Service Data Catalogue
Query
Optimisation
Query
Development
Data
Federation
Data
Discovery
Abstraction / Semantic Layer
Security Layer
Kerberos Delegation + Encryption in Transit + Extensive Auditing
Secure
Faster
Connect to data stores or direct to source Get access to the right data, fast.
Self service
Flexible protocols
 Analytics
 Self Service
 Business Intelligence
 Transactional Applications
 Bring your own tool
Built using
technology by
19
Query Federation to Local Data Sources
Every Data Virtualization cluster is connected to local
data sources, and is the access point for local
consumer apps such as BI and analytics tools. Each
Data Virtualization cluster has visibility of the datasets
available from all other clusters, and requests this data
from it's peer cluster as required by end users
Brisbane
Perth
Santiago
Houston
Cloud
Tenancy
Data
Lake
Data
Mart
Data
Mart
Analytics
Analytics
Analytics
20
1. Cloud architectures – both hybrid and multi-Cloud – are complex beasts
▪ A Logical Data Fabric using Data Virtualization can simplify the
architecture andmake
it easier for users to find and access the data that theyneed
2. In a multi-Cloud architecture, the Data Fabric should also be distributed –
providing global access to data coupled with local control
3. A Data Fabric also provides a unified security layer for data access
▪ A single place to enforce data access control – allowing users to access
the data that
they need rather than data based on organizationalsilos
Conclusions
Product Demonstration
Sales Engineering, Denodo
Katrina Briedis
22
Demo Scenario
Tim
Mary
Jane
Manager
(South Region)
Manager
(North Region)
Data Analyst
(Corporate)
• Access to Southern Region Employee data
• Unnecessary data hidden or masked
• e.g. monthly salary, bonus rate, DOB
& email address
• No access to Northern Region data at all
• Access to Northern Region Staff data
• Unnecessary data hidden or masked
• e.g. monthly salary, bonus rate, DOB &
email address
• No access to Southern Region data at all
• Access to all de-identified employee data
• PII data hidden
• Access to data in all locations (North & South)
23
Tim
Mary
Jane
Corporate HQ
DATA VIRTUALIZATION
Multi-Location Data Access
24
1
2
3
4
5
6
7
8
9
10
SINGLE ACCESS POINT
APPLY BUSINESS RULES
PUBLISH DATA FOR RE-USE
BUILD A LOGICAL VIEW
APPLY DATA SECURITY
DATA DISCOVERY
CONNECT TO DISPARATE DATA
3RD PARTY TOOL ACCESS
HARVEST THE METADATA
Demonstrate
STANDARDIZE DATA
South - Oracle North - Snowflake
Differences in tables
Different Table
Names
Different Field
Names
Different values
/ reference
North - Snowflake
South - Oracle Standarardised views
Same Naming
Convention
Same Field
Names
Standardised
Values
Next Steps
30
https://denodo.link/2Ry1PZI
31
https://denodo.link/3f4po5H
Enabling Self-Service Analytics with
Logical Data Warehouse (APAC)
Thursday 17 June
1:00pm AEST | 11:00am SGT | 8:30am IST
https://denodo.link/3bCP8no
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)

  • 1.
    DATA VIRTUALIZATION APAC WEBINARSERIES Sessions Covering Key Data Integration Challenges Solved with Data Virtualization
  • 2.
    Simplifying Your CloudArchitecture with a Logical Data Fabric Katrina Briedes APAC Sales Engineering, Denodo Sushant Kumar Product Marketing Manager, Denodo
  • 3.
    Agenda 1. What isa Data Fabric 2. Cloud Migration Choices 3. Customer story 4. Product Demo 5. Q&A 6. Next Steps
  • 4.
  • 5.
    A data fabricis an architecture pattern that informs and automates the design, integration and deployment of data objects regardless of deployment platforms and architectural approaches. It utilizes continuous analytics and AI/ML over all metadata assets to provide actionable insights and recommendations on data management and integration design and deployment patterns. This results in faster, informed and, in some cases, completely automated data access and sharing. Data Fabric Definition 5
  • 6.
    6 Pictorial View ofa Data Fabric – from Gartner Data Fabric Net Compounds Customers Products Claims RDBMS/OLTP Flat Files Legacy Third Party Traditional Analytics/BI Data Warehouse ETL ETL Mart Mart Data Lakes Cloud Data Stores Apps andDocument Repositories XML • JSON • PDF DOC • WEB
  • 7.
    - Forrester Research,June 2020 “Dynamically orchestrating disparate data sources intelligently and securely in a self-service manner and leveraging various data platforms to deliver integrated and trusted data to support various applications, analytics, and use cases” Data Fabric Definition 7
  • 8.
    8 Data management Metadata/catalog Data security Datagovernance Data processing Data quality Data lineage Global distributed platform, in-memory, embedded, self-service, and APIs AI/ML Global data access Data modeling, preparation, curation, and graph engine AI/ML Data discovery Transformation, integration, and cleansing AI/ML Data orchestration Data platform — processing Data processing/ persistence Hadoop NoSQL Spark Policies Ingestion, streaming, and data movement Data ingestion/streaming AI/ML AI/ML On-premises Cloud Data sources Data lake EDW/BDW AI/ML Forrester Data Fabric Architecture
  • 9.
    9 Data management Metadata/catalog Data security Datagovernance Data processing Data quality Data lineage Global distributed platform, in-memory, embedded, self-service, and APIs AI/ML Global data access Data modeling, preparation, curation, and graph engine AI/ML Data discovery Transformation, integration, and cleansing AI/ML Data orchestration Data platform — processing Data processing/ persistence Hadoop NoSQL Spark Data lake EDW/BDW AI/ML Policies Ingestion, streaming, and data movement Data ingestion/streaming AI/ML AI/ML On-premises Cloud Data sources The Logical Data Fabric Architecture
  • 10.
    10 • Data Abstraction:decoupling applications/data usage from data sources • Data Integration without replication or relocation of physical data • Easy Access to Any Data, high performant and real-time/ right- time • Data Catalog for self-service data services and easy discovery • Unified metadata, security & governance across all data assets • Data Delivery in any format with intelligent query optimization that leverages new and existing physical data platforms A logical data layer – a “logical data fabric” – that provides high-performant, real-time, and secure access to integrated business views of disparate data across the enterprise Data Virtualization: Logical Data Fabric
  • 11.
    11 Stages of aCloud Journey All systems are on-premise. Using traditionaldatabases, etc. – maybe an on-premise Hadoop cluster. Lots of ETL pipelines. Using Denodofor integrated view of data. Systems are now on-premise and in the Cloud – initially hosted by the preferred Cloud provider. The data is balanced across the different environments although the bulk of the data is initially on-premise. ETL-style data movement is often used to move data from on-premise systems to Cloud-based analytical systems. The systems are more complex and users need to be able to find and access data from on- premise and Cloud locations. In reality, this is a hybrid/multi-Cloud environment, with systems in multiple Clouds (AWS, Azure, GCP, Salesforce, etc.) and a few legacy systems still on-premise. The environment is even more complex as workloads can move between Cloud providers to take advantage of new capabilities, cost optimization, etc. Users still need to find and access data in this environment. System modernization initiatives move applications and data to the Cloud. For critical systems, this migration is typically a phased approach over a period of months(or years). On- Premise Transition to Cloud Hybrid Single Cloud Multi- Cloud (Note: Most organizations skip this stage and go straight to multi-Cloud) Systems have moved to the Cloud (although some systems are still on-premise and cannot be moved to the Cloud). The ‘center of gravity’ for data is solidly in the Cloud. More processing and data integration occurs in the Cloud. Data is moved from on-premise systems to the Cloud using ETL. User data access is predominantly from Cloud systems.
  • 12.
    12 Cloud Migrations Options •Re-Host – ‘Lift and Shift’ – Take existing data and copy it to Cloud “as is” into same database • Good for smaller data sets or data sets with low importance • Re-Platform – Relocate to new database running on Cloud – everything else stays the same • e.g. move from Oracle 12g to Snowflake • Re-Factor/Re-Architect – Move to a different database *and* change the data schema • e.g. move from Oracle to Redshift and re-factor data model, partitioning, etc.
  • 13.
  • 14.
    14 Cloud Migration UsingData Virtualization • Large or critical Cloud migrations are risky • Big Bang approach is not advised • Phased approach is recommended • Select data set to migrate, copy to Cloud • Test and tune data access, then go live • Repeat for next data set and so on • Use Denodo as abstraction layer during migration process • Isolate users from shift of data
  • 15.
    15 Hybrid Data Integrationwith a Logical Data Fabric Common access point for both on-premise and cloud sources • Access to all sources as a single schema with no replication: Virtual data lake • Enables combination of data across sources, regardless of nature and location • Allows definition of common semantic model • Single security model and single Active Directory Data Center Cloud
  • 16.
    16 Multi-Cloud Integration withLogical Data Fabric Amazon RDS, Aurora US East AvailabilityZone EMEA AvailabilityZone On-prem data center
  • 17.
    17 BHP Builds aLogical Data Fabric Using Data Virtualization BHP wanted to manage business risk by integrating data systems across multiple geographies. But this was a time consuming and expensive operation. BHP’s global application landscape provides limited and restricted reusability of existing data platforms which lead to: • Repeated engineering effort to access the same data sources for different data solutions • Long lead times to ingest or load data before a data solu on can be developed • Project-centric data repositories are created to provide a consolidated set of data for a specific purpose, increasing total cost of ownership, complexity and variability in data interpretation BHP is among the world's top producers of major commodities including iron ore, coal and copper. They have a global presence with operations and offices across Australia, Asia, UK, Canada, USA and central and south America.
  • 18.
    18 Reference Architecture Data Source Application data stores  SaaS / Cloud Applications  Application interfaces  Manual data sources Data Virtualization Platform Consumers  Enterprise & Regional Data Stores Self Service Data Catalogue Query Optimisation Query Development Data Federation Data Discovery Abstraction / Semantic Layer Security Layer Kerberos Delegation + Encryption in Transit + Extensive Auditing Secure Faster Connect to data stores or direct to source Get access to the right data, fast. Self service Flexible protocols  Analytics  Self Service  Business Intelligence  Transactional Applications  Bring your own tool Built using technology by
  • 19.
    19 Query Federation toLocal Data Sources Every Data Virtualization cluster is connected to local data sources, and is the access point for local consumer apps such as BI and analytics tools. Each Data Virtualization cluster has visibility of the datasets available from all other clusters, and requests this data from it's peer cluster as required by end users Brisbane Perth Santiago Houston Cloud Tenancy Data Lake Data Mart Data Mart Analytics Analytics Analytics
  • 20.
    20 1. Cloud architectures– both hybrid and multi-Cloud – are complex beasts ▪ A Logical Data Fabric using Data Virtualization can simplify the architecture andmake it easier for users to find and access the data that theyneed 2. In a multi-Cloud architecture, the Data Fabric should also be distributed – providing global access to data coupled with local control 3. A Data Fabric also provides a unified security layer for data access ▪ A single place to enforce data access control – allowing users to access the data that they need rather than data based on organizationalsilos Conclusions
  • 21.
  • 22.
    22 Demo Scenario Tim Mary Jane Manager (South Region) Manager (NorthRegion) Data Analyst (Corporate) • Access to Southern Region Employee data • Unnecessary data hidden or masked • e.g. monthly salary, bonus rate, DOB & email address • No access to Northern Region data at all • Access to Northern Region Staff data • Unnecessary data hidden or masked • e.g. monthly salary, bonus rate, DOB & email address • No access to Southern Region data at all • Access to all de-identified employee data • PII data hidden • Access to data in all locations (North & South)
  • 23.
  • 24.
    24 1 2 3 4 5 6 7 8 9 10 SINGLE ACCESS POINT APPLYBUSINESS RULES PUBLISH DATA FOR RE-USE BUILD A LOGICAL VIEW APPLY DATA SECURITY DATA DISCOVERY CONNECT TO DISPARATE DATA 3RD PARTY TOOL ACCESS HARVEST THE METADATA Demonstrate STANDARDIZE DATA
  • 25.
    South - OracleNorth - Snowflake Differences in tables Different Table Names Different Field Names Different values / reference
  • 26.
    North - Snowflake South- Oracle Standarardised views Same Naming Convention Same Field Names Standardised Values
  • 29.
  • 30.
  • 31.
  • 32.
    Enabling Self-Service Analyticswith Logical Data Warehouse (APAC) Thursday 17 June 1:00pm AEST | 11:00am SGT | 8:30am IST https://denodo.link/3bCP8no
  • 33.
    Thanks! www.denodo.com info@denodo.com © CopyrightDenodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.