Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)

DATA VIRTUALIZATION
APAC WEBINAR SERIES
Sessions Covering Key Data
Integration Challenges Solved
with Data Virtualization

Simplifying Your Cloud Architecture
with a Logical Data Fabric
Katrina Briedes
APAC Sales Engineering, Denodo
Sushant Kumar
Product Marketing Manager, Denodo

Agenda
1. What is a Data Fabric
2. Cloud Migration Choices
3. Customer story
4. Product Demo
5. Q&A
6. Next Steps

A data fabric is an architecture pattern that informs and automates the design,
integration and deployment of data objects regardless of deployment platforms and
architectural approaches.
It utilizes continuous analytics and AI/ML over all metadata assets to provide actionable
insights and recommendations on data management and integration design and
deployment patterns.
This results in faster, informed and, in some cases, completely automated data access
and sharing.
Data Fabric Definition
5

6
Pictorial View of a Data Fabric – from Gartner
Data Fabric Net
Compounds Customers Products Claims
RDBMS/OLTP
Flat Files
Legacy
Third Party
Traditional Analytics/BI
Data Warehouse
ETL ETL
Mart Mart
Data Lakes Cloud Data Stores Apps andDocument
Repositories
XML • JSON • PDF
DOC • WEB

- Forrester Research, June 2020
“Dynamically orchestrating disparate data sources intelligently and
securely in a self-service manner and leveraging various data platforms to
deliver integrated and trusted data to support various applications,
analytics, and use cases”
Data Fabric Definition
7

8
Data management
Metadata/catalog
Data security
Data governance
Data processing
Data quality
Data lineage
Global distributed platform, in-memory, embedded,
self-service, and APIs
AI/ML
Global data access
Data modeling, preparation, curation, and graph engine
AI/ML
Data discovery
Transformation, integration, and cleansing
AI/ML
Data orchestration
Data platform —
processing
Data processing/
persistence
Hadoop
NoSQL
Spark
Policies
Ingestion, streaming, and data movement Data ingestion/streaming
AI/ML AI/ML
On-premises
Cloud Data sources
Data lake
EDW/BDW
AI/ML
Forrester Data Fabric Architecture

9
Data management
Metadata/catalog
Data security
Data governance
Data processing
Data quality
Data lineage
Global distributed platform, in-memory, embedded,
self-service, and APIs
AI/ML
Global data access
Data modeling, preparation, curation, and graph engine
AI/ML
Data discovery
Transformation, integration, and cleansing
AI/ML
Data orchestration
Data platform —
processing
Data processing/
persistence
Hadoop
NoSQL
Spark
Data lake
EDW/BDW
AI/ML
Policies
Ingestion, streaming, and data movement Data ingestion/streaming
AI/ML AI/ML
On-premises
Cloud Data sources
The Logical Data Fabric Architecture

10
• Data Abstraction: decoupling
applications/data usage from data
sources
• Data Integration without replication
or relocation of physical data
• Easy Access to Any Data, high
performant and real-time/ right-
time
• Data Catalog for self-service data
services and easy discovery
• Unified metadata, security &
governance across all data assets
• Data Delivery in any format with
intelligent query optimization that
leverages new and existing
physical data platforms
A logical data layer – a “logical data fabric” – that provides high-performant, real-time, and secure
access to integrated business views of disparate data across the enterprise
Data Virtualization: Logical Data Fabric

11
Stages of a Cloud Journey
All systems are on-premise.
Using traditionaldatabases,
etc. – maybe an on-premise
Hadoop cluster. Lots of ETL
pipelines. Using Denodofor
integrated view of data.
Systems are now on-premise and in the Cloud –
initially hosted by the preferred Cloud provider. The
data is balanced across the different environments
although the bulk of the data is initially on-premise.
ETL-style data movement is often used to move data
from on-premise systems to Cloud-based analytical
systems. The systems are more complex and users
need to be able to find and access data from on-
premise and Cloud locations.
In reality, this is a hybrid/multi-Cloud environment, with
systems in multiple Clouds (AWS, Azure, GCP, Salesforce,
etc.) and a few legacy systems still on-premise. The
environment is even more complex as workloads can
move between Cloud providers to take advantage of new
capabilities, cost optimization, etc. Users still need to find
and access data in this environment.
System modernization initiatives move applications and
data to the Cloud. For critical systems, this migration is
typically a phased approach over a period of months(or
years).
On-
Premise
Transition
to Cloud
Hybrid
Single
Cloud
Multi-
Cloud
(Note: Most organizations skip this stage and go straight to
multi-Cloud)
Systems have moved to the Cloud (although some systems
are still on-premise and cannot be moved to the Cloud).
The ‘center of gravity’ for data is solidly in the Cloud. More
processing and data integration occurs in the Cloud. Data is
moved from on-premise systems to the Cloud using ETL.
User data access is predominantly from Cloud systems.

12
Cloud Migrations Options
• Re-Host – ‘Lift and Shift’ – Take existing data and copy it to Cloud “as is” into same
database
• Good for smaller data sets or data sets with low importance
• Re-Platform – Relocate to new database running on Cloud – everything else stays
the same
• e.g. move from Oracle 12g to Snowflake
• Re-Factor/Re-Architect – Move to a different database *and* change the data
schema
• e.g. move from Oracle to Redshift and re-factor data model, partitioning, etc.

14
Cloud Migration Using Data Virtualization
• Large or critical Cloud migrations are risky
• Big Bang approach is not advised
• Phased approach is recommended
• Select data set to migrate, copy to Cloud
• Test and tune data access, then go live
• Repeat for next data set and so on
• Use Denodo as abstraction layer during
migration process
• Isolate users from shift of data

15
Hybrid Data Integration with a Logical Data Fabric
Common access point for both on-premise
and cloud sources
• Access to all sources as a single
schema
with no replication: Virtual data lake
• Enables combination of data
across
sources, regardless of nature and
location
• Allows definition of common
semantic
model
• Single security model and single
Active
Directory
Data Center
Cloud

16
Multi-Cloud Integration with Logical Data Fabric
Amazon RDS,
Aurora
US East
AvailabilityZone
EMEA
AvailabilityZone
On-prem
data center

17
BHP Builds a Logical Data Fabric Using Data Virtualization
BHP wanted to manage business risk by integrating data systems across
multiple geographies. But this was a time consuming and expensive operation.
BHP’s global application landscape provides limited and restricted reusability of existing
data platforms which lead to:
• Repeated engineering effort to access the same data sources for diﬀerent data
solutions
• Long lead times to ingest or load data before a data solu on can be developed
• Project-centric data repositories are created to provide a consolidated set of data for
a specific purpose, increasing total cost of ownership, complexity and variability in
data interpretation
BHP is among the world's
top producers of major
commodities including iron
ore, coal and copper. They
have a global presence with
operations and offices
across Australia, Asia, UK,
Canada, USA and central
and south America.

18
Reference Architecture
Data Source
 Application data stores
 SaaS / Cloud Applications
 Application interfaces
 Manual data sources
Data Virtualization Platform Consumers
 Enterprise &
Regional Data
Stores
Self Service Data Catalogue
Query
Optimisation
Query
Development
Data
Federation
Data
Discovery
Abstraction / Semantic Layer
Security Layer
Kerberos Delegation + Encryption in Transit + Extensive Auditing
Secure
Faster
Connect to data stores or direct to source Get access to the right data, fast.
Self service
Flexible protocols
 Analytics
 Self Service
 Business Intelligence
 Transactional Applications
 Bring your own tool
Built using
technology by

19
Query Federation to Local Data Sources
Every Data Virtualization cluster is connected to local
data sources, and is the access point for local
consumer apps such as BI and analytics tools. Each
Data Virtualization cluster has visibility of the datasets
available from all other clusters, and requests this data
from it's peer cluster as required by end users
Brisbane
Perth
Santiago
Houston
Cloud
Tenancy
Data
Lake
Data
Mart
Data
Mart
Analytics
Analytics
Analytics

20
1. Cloud architectures – both hybrid and multi-Cloud – are complex beasts
▪ A Logical Data Fabric using Data Virtualization can simplify the
architecture andmake
it easier for users to find and access the data that theyneed
2. In a multi-Cloud architecture, the Data Fabric should also be distributed –
providing global access to data coupled with local control
3. A Data Fabric also provides a unified security layer for data access
▪ A single place to enforce data access control – allowing users to access
the data that
they need rather than data based on organizationalsilos
Conclusions

Product Demonstration
Sales Engineering, Denodo
Katrina Briedis

22
Demo Scenario
Tim
Mary
Jane
Manager
(South Region)
Manager
(North Region)
Data Analyst
(Corporate)
• Access to Southern Region Employee data
• Unnecessary data hidden or masked
• e.g. monthly salary, bonus rate, DOB
& email address
• No access to Northern Region data at all
• Access to Northern Region Staff data
• Unnecessary data hidden or masked
• e.g. monthly salary, bonus rate, DOB &
email address
• No access to Southern Region data at all
• Access to all de-identified employee data
• PII data hidden
• Access to data in all locations (North & South)

23
Tim
Mary
Jane
Corporate HQ
DATA VIRTUALIZATION
Multi-Location Data Access

24
1
2
3
4
5
6
7
8
9
10
SINGLE ACCESS POINT
APPLY BUSINESS RULES
PUBLISH DATA FOR RE-USE
BUILD A LOGICAL VIEW
APPLY DATA SECURITY
DATA DISCOVERY
CONNECT TO DISPARATE DATA
3RD PARTY TOOL ACCESS
HARVEST THE METADATA
Demonstrate
STANDARDIZE DATA

South - Oracle North - Snowflake
Differences in tables
Different Table
Names
Different Field
Names
Different values
/ reference

North - Snowflake
South - Oracle Standarardised views
Same Naming
Convention
Same Field
Names
Standardised
Values

30
https://denodo.link/2Ry1PZI

31
https://denodo.link/3f4po5H

Enabling Self-Service Analytics with
Logical Data Warehouse (APAC)
Thursday 17 June
1:00pm AEST | 11:00am SGT | 8:30am IST
https://denodo.link/3bCP8no

Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)

Similar to Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC) (20)

More from Denodo

More from Denodo (20)

Recently uploaded

Recently uploaded (20)

Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)