Modernizing Data
Architecture Using Data
Virtualization
Multipurpose Data Lake and Data
Virtualization enabled Data Fabric
Chris Day, Director Sales Engineering, APAC
2
• Competition from a low cost
vendor
• Lower the price, affecting
margins?
• Or, maintain high price, but
differentiate in other ways?
3
Benefits
Large Heavy Equipment Manufacturer
Self-service / Predictive Analytics – IoT Integration
Improved asset performance and
proactive maintenance
Increased revenue from sale of
services and parts
Reduced warranty costs of parts
failure
4
Current Requirements in Data Management
1. Faster & more accurate decision making
▪ Significant increase in business speed & complexity of
requirements
2. Regulations, enterprise-wide governance & data security
▪ Thousand of new regulations worldwide: tax, finance, privacy, HR,
environmental, GDPR, etc.
3. IT cost reduction
▪ Huge data growth with associated storage and operational costs
5
Challenges: Fragmentation of the Data Landscape
ETL
Data Warehouse
Kafka
Physical Data
Lake
ML/AI
SQL
interface
IT Storage and Processing
Streaming
Analytics
Distributed Storage
Files
Bus. Tools, Ent. Apps,
Portals, Mobile…
Gov/S
ec
Gov/Sec
Gov/
Sec
G
o
v
/
S
e
c
Gov/Sec
Gov/Sec
Gov/Sec
Gov/SecGov/SecGov/SecGov/Sec
Bus.LogicBus.LogicBus.LogicBus.Logic
IT has to
implement Gov.
& Sec. at every
data source Bus. adds Data Logic in
every report, tool, etc.
6
Modern Data Architecture
7
Quiz
Where is the data for your data lake located?
1. ‘In the cloud’
2. On-premise
3. Both ‘in the cloud’ and on-premise
4. We don’t have a data lake
Quiz number 1
8
Gartner – The Rise of Logical Architectures
This is a Second Major Cycle of Analytical Consolidation
Operational Application
Operational Application
Operational Application
IoT Data
Other NewData
Operational
Application
Operational
Application
Cube
Operational
Application
Cube
? Operational Application
Operational Application
Operational Application
IoT Data
Other NewData
1980s
Pre EDW
1990s
EDW
2010s2000s
Post EDW
Time
LDW
Operational
Application
Operational
Application
Operational
Application
Data
Warehouse
Data
Warehouse
Data
Lake
?
Logical Data
Warehouse
Data Warehouse
Data Lake
Marts
ODS
Staging/Ingest
Unified analysis
› Consolidated data
› "Collect the data"
› Single server, multiple nodes
› More analysis than any
one server can provide
©2018 Gartner, Inc.
Unified analysis
› Logically consolidated view of all data
› "Connect and collect"
› Multiple servers, of multiple nodes
› More analysis than any one system can provide
ID: 342254
Fragmented/
nonexistent analysis
› Multiple sources
› Multiple structured sources
Fragmented analysis
› "Collect the data" (Into
› different repositories)
› New data types,
› processing, requirements
› Uncoordinated views
9
Gartner – The Rise of Logical Architectures
This is a Second Major Cycle of Analytical Consolidation
Operational Application
Operational Application
Operational Application
IoT Data
Other NewData
Operational
Application
Operational
Application
Cube
Operational
Application
Cube
? Operational Application
Operational Application
Operational Application
IoT Data
Other NewData
1980s
Pre EDW
1990s
EDW
2010s2000s
Post EDW
Time
LDW
Operational
Application
Operational
Application
Operational
Application
Data
Warehouse
Data
Warehouse
Data
Lake
?
Unified analysis
› Consolidated data
› "Collect the data"
› Single server, multiple nodes
› More analysis than any
one server can provide
©2018 Gartner, Inc.
Unified analysis
› Logically consolidated view of all data
› "Connect and collect"
› Multiple servers, of multiple nodes
› More analysis than any one system can provide
ID: 342254
Fragmented/
nonexistent analysis
› Multiple sources
› Multiple structured sources
Fragmented analysis
› "Collect the data" (Into
› different repositories)
› New data types,
› processing, requirements
› Uncoordinated views
Operational Application
Operational Application
Operational Application
IoT Data
Other NewData
Logical Data
Warehouse
Data Warehouse
Data Lake
Marts
ODS
Staging/Ingest
Data
Virtualization
√ Improved Time to Market by 50 to 90%
√ Improved Report Consistency
√ Reduce Duplication of Data
√ Improve Transparency
√ Reduced development Cost
√ Future Proof the architecture against
technology changes
10
What are Data Lakes?
• A storage repository that holds a vast
amount of raw data in its native
format.
• Hadoop and its ecosystem provided
the foundation: vast storage and
processing muscle
• Advanced analytic tools and mining
software intake raw data from data
lakes and transform it into useful
insight.
11
• Hadoop seen as their personal
supercomputer.
• Data Lakes helped democratise
access storage and computing
with off-the-shelf hardware.
• Hadoop–based solutions became
the standard to bring modern
analytics to any corporation
Data Lakes – A Data Scientist’s Playground
12
Data Lakes – Not a Perfect World
Physical Nature
• Based on Replication
• Require data to be copied to its physical storage
• Extends development cycles and costs
• Not all data is suitable for replication
• Real time needs: Cloud and SaaS APIs
• Large volumes: existing EDW
• Laws and restrictions
Single Purpose
• Usage of the data lake is often monopolised
• New silo of data, requires additional skills
• Governance, security & quality may differ what user expect (e.g. EDW)
13
Multi‐purpose data lakes are data delivery environments developed
to support a broad range of users, from traditional self‐service BI users
(e.g. finance, marketing, human resource, transport) to sophisticated data
scientists.
Multi‐purpose data lakes allow a broader and deeper use of the data
lake investment without minimizing the potential value for data
science and without making it an inflexible environment.
Rick Van der Lans, R20 Consultancy
14
The Multipurpose Data Lake with Data Virtualization
“Amulti-purpose data lake can become an organization’s universal data delivery system”
Architecting the Multi-Purpose Data Lake with Data Virtualization , Rick Van der Lans, April 2018
15
Denodo’s Coronavirus Data Portal
File
Denodo Express
COVID-19 Edition
Data
Catalog
Data
Portal
JDBC
ODBC
API
GraphQL
GeoJSON
Sandbox
Sandbox
Sandbox
16
http://coronavirusdataportal.com/
17
The Multipurpose Data Lake with Data Virtualization
Logical Nature
• Replication is an option, not a necessity
• Broaden data access, shorten development times, better
insights
• Tight integration with big data systems. Fast execution with
large data volumes
Multi-purpose
• Curated access for non-technical users
• Better governance and access control
• Better ROI for the investment of the lake
18
Single access to all data assets,
internal & external including:
▪ Physical Data Lake (usually based on SQL-on-
Hadoop systems)
▪ Other databases (EDW, ODS, applications,
etc.)
▪ SaaS APIs (Salesforce, Google, social media,
etc.)
▪ Files (local, S3, Azure, etc.)
The Virtual Data Lake – Access to all Data Sources
19
Denodo optimizer provides native integration with
MPP systems to provide one extra key capability:
Query Acceleration
Denodo can move, on demand, processing during
execution:
• Parallel power for calculations in the
virtual layer
• Avoids slow processing on disk for large
data volumes
The Virtual Data Lake – Using the Lake Processing Engine
20
join
Group by ZIP
join
Group by ZIP
The Logical Data Lake – Putting the Pieces Together
2M rows
(sales by customer)
Customer
(2M rows)
System Execution Time Optimization Techniques
Others ~ 10 min Basic
No MPP 43 sec Aggregation push-down
With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes)
Sales
(300 million rows)
join
Group by ZIP
1. Partial Aggregation
push down
Maximizes source processing
Reduces network traffic
3. On-demand data transfer
For SQL-on-Hadoop systems,
Denodo automatically generates
and upload Parquet files
4. Integration with local
and pre-cached data
The engine detects when data
Is cached or a is native table
in the MPP
2. Integrated with Cost Based Optimizer
Based on data volume estimation and
the cost of these particular operations,
the CBO can decide to move all or part
Of the execution tree to the MPP
5. Fast parallel execution
Support for Spark, Presto and Impala
For fast analytical processing in
inexpensive Hadoop-based solutions
With MPP Integration
group by
customer ID
21
The Forrester Wave, Enterprise Data Fabric, Q2, 2020
Data fabric focuses on automating the process integration,
transformation, preparation, curation, security, governance,
and orchestration to enable analytics and insights quickly for
business success.
22
Forrester’s Big Data Fabric
23
Forrester’s Big Data Fabric
Data Virtualization
24
Big Data Fabric – Data Abstraction Layer
Abstracts access to disparate
data sources
Acts as a single repository
(virtual)
Makes data available in
real-time to consumers
25
BI and Analytics Reference Architecture
IT: Flexible Source Architecture
Business: Flexible
Tool Choice
IT can now
move at
slower
speed w/o
affecting
business
Business can
now make
faster & more
sophisticated
decisions as
all data
accessible by
any tool of
choice
Cloud DW
(Snowflake
, etc)
26
BI and Analytics Reference Architecture
IT: Flexible Source Architecture
Business: Flexible
Tool Choice
IT can now
move at
slower
speed w/o
affecting
business
Business can
now make
faster & more
sophisticated
decisions as
all data
accessible by
any tool of
choice
Cloud DW
(Snowflake
, etc)
Data-as-a-
Service
ITSemantic–where
stored&processd
BusSemantic–how
consumed&used
27
Data Fabric – Use Cases
Data Warehouse OffloadingIoT Integration
28
Photo by Obi Onyeador on Unsplash
29
Customer Case Study - Asurion
• 290 million consumers
• Annual revenues (FY
2016) $5.8 B
• Over 17,000
employees
• 49 Offices, 18
Countries
• Insurance &
Warranties on digital
devices
BUSINESS NEED
• Reduce time to create new services and products from months to weeks.
• Meet strict restrictions on migrating data out of countries of origin.
• Centralize companywide security management around a single point of control.
THE CHALLENGE:
Expand their data architecture to cope with global growth, while
exceeding the expectations of the customers.
30
Asurion – Digital Transformation
SOLUTION:
• Asurion developed a hybrid
data layer across the cloud &
on-premise data.
• A single point of access to the
data ensuring security
compliance.
• Removed complexities of data
access from the consumers,
enabling better integration &
improved analtyics
32
The Architecture
Sources
2. Combine
Combine,
Transform
&
Semantics
3. Consume
1. Connect
Consuming Applications
4.Dev/Ops
33
Current Requirements in Data Management
1. Faster & more accurate decision making
▪ Data Virtualization – Single platform for all enterprise data
2. Regulations, enterprise-wide governance & data security
▪ Data Virtualization – Unified metadata management for
governance and security
3. IT cost reduction
▪ Data Virtualization – Minimise data management infrastructure
Data Virtualization:
1. Enables multi-use data lake reducing costs &
increasing collaboration
2. Unifies disparate data sources in real-time
3. Supports self-service & data discovery
4. Centralises governance & security of enterprise
data assets
KEY TAKEAWAYS
35
Next Steps
Access Denodo Platform in the Cloud!
Take a Test Drive today!
https://www.denodo.com/TestDrive
G E T S TA R T E D TO DAY
36
Denodo’s 2020 Global Cloud Survey Webinar
37
Useful Links
• Data Virtualization for Dummies - Learn how to put data virtualization to
work in your organisation: Integrate all data source, deliver big data solutions
that work, take the pain out of cloud adoption and drive digital
transformation.
• Data Virtualization: The Modern Data Integration Solution - Data
virtualization is a modern data integration approach that is already meeting
today’s data integration challenges, providing the foundation for data
integration in the future. Download this whitepaper to learn more about:
The fundamental challenge for organizations today, why traditional solutions
fall short and why data virtualization is the core solution.
38
Denodo
The Leader in Data Virtualization
DENODO OFFICES, CUSTOMERS, PARTNERS
Palo Alto, CA.
Global presence throughout North America,
EMEA, APAC, and Latin America.
LEADERSHIP
▪ Longest continuous focus on data
virtualization – since 1999
▪ Leader in 2018 Forrester Wave – Big
Data Fabric
▪ Winner of numerous awards
CUSTOMERS
~800 customers, including many F500 and
G2000 companies across every major industry
have gained significant business agility and ROI.
FINANCIALS
Backed by $4B+ private equity firm.
50+% annual growth; Profitable.
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization

  • 1.
    Modernizing Data Architecture UsingData Virtualization Multipurpose Data Lake and Data Virtualization enabled Data Fabric Chris Day, Director Sales Engineering, APAC
  • 2.
    2 • Competition froma low cost vendor • Lower the price, affecting margins? • Or, maintain high price, but differentiate in other ways?
  • 3.
    3 Benefits Large Heavy EquipmentManufacturer Self-service / Predictive Analytics – IoT Integration Improved asset performance and proactive maintenance Increased revenue from sale of services and parts Reduced warranty costs of parts failure
  • 4.
    4 Current Requirements inData Management 1. Faster & more accurate decision making ▪ Significant increase in business speed & complexity of requirements 2. Regulations, enterprise-wide governance & data security ▪ Thousand of new regulations worldwide: tax, finance, privacy, HR, environmental, GDPR, etc. 3. IT cost reduction ▪ Huge data growth with associated storage and operational costs
  • 5.
    5 Challenges: Fragmentation ofthe Data Landscape ETL Data Warehouse Kafka Physical Data Lake ML/AI SQL interface IT Storage and Processing Streaming Analytics Distributed Storage Files Bus. Tools, Ent. Apps, Portals, Mobile… Gov/S ec Gov/Sec Gov/ Sec G o v / S e c Gov/Sec Gov/Sec Gov/Sec Gov/SecGov/SecGov/SecGov/Sec Bus.LogicBus.LogicBus.LogicBus.Logic IT has to implement Gov. & Sec. at every data source Bus. adds Data Logic in every report, tool, etc.
  • 6.
  • 7.
    7 Quiz Where is thedata for your data lake located? 1. ‘In the cloud’ 2. On-premise 3. Both ‘in the cloud’ and on-premise 4. We don’t have a data lake Quiz number 1
  • 8.
    8 Gartner – TheRise of Logical Architectures This is a Second Major Cycle of Analytical Consolidation Operational Application Operational Application Operational Application IoT Data Other NewData Operational Application Operational Application Cube Operational Application Cube ? Operational Application Operational Application Operational Application IoT Data Other NewData 1980s Pre EDW 1990s EDW 2010s2000s Post EDW Time LDW Operational Application Operational Application Operational Application Data Warehouse Data Warehouse Data Lake ? Logical Data Warehouse Data Warehouse Data Lake Marts ODS Staging/Ingest Unified analysis › Consolidated data › "Collect the data" › Single server, multiple nodes › More analysis than any one server can provide ©2018 Gartner, Inc. Unified analysis › Logically consolidated view of all data › "Connect and collect" › Multiple servers, of multiple nodes › More analysis than any one system can provide ID: 342254 Fragmented/ nonexistent analysis › Multiple sources › Multiple structured sources Fragmented analysis › "Collect the data" (Into › different repositories) › New data types, › processing, requirements › Uncoordinated views
  • 9.
    9 Gartner – TheRise of Logical Architectures This is a Second Major Cycle of Analytical Consolidation Operational Application Operational Application Operational Application IoT Data Other NewData Operational Application Operational Application Cube Operational Application Cube ? Operational Application Operational Application Operational Application IoT Data Other NewData 1980s Pre EDW 1990s EDW 2010s2000s Post EDW Time LDW Operational Application Operational Application Operational Application Data Warehouse Data Warehouse Data Lake ? Unified analysis › Consolidated data › "Collect the data" › Single server, multiple nodes › More analysis than any one server can provide ©2018 Gartner, Inc. Unified analysis › Logically consolidated view of all data › "Connect and collect" › Multiple servers, of multiple nodes › More analysis than any one system can provide ID: 342254 Fragmented/ nonexistent analysis › Multiple sources › Multiple structured sources Fragmented analysis › "Collect the data" (Into › different repositories) › New data types, › processing, requirements › Uncoordinated views Operational Application Operational Application Operational Application IoT Data Other NewData Logical Data Warehouse Data Warehouse Data Lake Marts ODS Staging/Ingest Data Virtualization √ Improved Time to Market by 50 to 90% √ Improved Report Consistency √ Reduce Duplication of Data √ Improve Transparency √ Reduced development Cost √ Future Proof the architecture against technology changes
  • 10.
    10 What are DataLakes? • A storage repository that holds a vast amount of raw data in its native format. • Hadoop and its ecosystem provided the foundation: vast storage and processing muscle • Advanced analytic tools and mining software intake raw data from data lakes and transform it into useful insight.
  • 11.
    11 • Hadoop seenas their personal supercomputer. • Data Lakes helped democratise access storage and computing with off-the-shelf hardware. • Hadoop–based solutions became the standard to bring modern analytics to any corporation Data Lakes – A Data Scientist’s Playground
  • 12.
    12 Data Lakes –Not a Perfect World Physical Nature • Based on Replication • Require data to be copied to its physical storage • Extends development cycles and costs • Not all data is suitable for replication • Real time needs: Cloud and SaaS APIs • Large volumes: existing EDW • Laws and restrictions Single Purpose • Usage of the data lake is often monopolised • New silo of data, requires additional skills • Governance, security & quality may differ what user expect (e.g. EDW)
  • 13.
    13 Multi‐purpose data lakesare data delivery environments developed to support a broad range of users, from traditional self‐service BI users (e.g. finance, marketing, human resource, transport) to sophisticated data scientists. Multi‐purpose data lakes allow a broader and deeper use of the data lake investment without minimizing the potential value for data science and without making it an inflexible environment. Rick Van der Lans, R20 Consultancy
  • 14.
    14 The Multipurpose DataLake with Data Virtualization “Amulti-purpose data lake can become an organization’s universal data delivery system” Architecting the Multi-Purpose Data Lake with Data Virtualization , Rick Van der Lans, April 2018
  • 15.
    15 Denodo’s Coronavirus DataPortal File Denodo Express COVID-19 Edition Data Catalog Data Portal JDBC ODBC API GraphQL GeoJSON Sandbox Sandbox Sandbox
  • 16.
  • 17.
    17 The Multipurpose DataLake with Data Virtualization Logical Nature • Replication is an option, not a necessity • Broaden data access, shorten development times, better insights • Tight integration with big data systems. Fast execution with large data volumes Multi-purpose • Curated access for non-technical users • Better governance and access control • Better ROI for the investment of the lake
  • 18.
    18 Single access toall data assets, internal & external including: ▪ Physical Data Lake (usually based on SQL-on- Hadoop systems) ▪ Other databases (EDW, ODS, applications, etc.) ▪ SaaS APIs (Salesforce, Google, social media, etc.) ▪ Files (local, S3, Azure, etc.) The Virtual Data Lake – Access to all Data Sources
  • 19.
    19 Denodo optimizer providesnative integration with MPP systems to provide one extra key capability: Query Acceleration Denodo can move, on demand, processing during execution: • Parallel power for calculations in the virtual layer • Avoids slow processing on disk for large data volumes The Virtual Data Lake – Using the Lake Processing Engine
  • 20.
    20 join Group by ZIP join Groupby ZIP The Logical Data Lake – Putting the Pieces Together 2M rows (sales by customer) Customer (2M rows) System Execution Time Optimization Techniques Others ~ 10 min Basic No MPP 43 sec Aggregation push-down With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes) Sales (300 million rows) join Group by ZIP 1. Partial Aggregation push down Maximizes source processing Reduces network traffic 3. On-demand data transfer For SQL-on-Hadoop systems, Denodo automatically generates and upload Parquet files 4. Integration with local and pre-cached data The engine detects when data Is cached or a is native table in the MPP 2. Integrated with Cost Based Optimizer Based on data volume estimation and the cost of these particular operations, the CBO can decide to move all or part Of the execution tree to the MPP 5. Fast parallel execution Support for Spark, Presto and Impala For fast analytical processing in inexpensive Hadoop-based solutions With MPP Integration group by customer ID
  • 21.
    21 The Forrester Wave,Enterprise Data Fabric, Q2, 2020 Data fabric focuses on automating the process integration, transformation, preparation, curation, security, governance, and orchestration to enable analytics and insights quickly for business success.
  • 22.
  • 23.
    23 Forrester’s Big DataFabric Data Virtualization
  • 24.
    24 Big Data Fabric– Data Abstraction Layer Abstracts access to disparate data sources Acts as a single repository (virtual) Makes data available in real-time to consumers
  • 25.
    25 BI and AnalyticsReference Architecture IT: Flexible Source Architecture Business: Flexible Tool Choice IT can now move at slower speed w/o affecting business Business can now make faster & more sophisticated decisions as all data accessible by any tool of choice Cloud DW (Snowflake , etc)
  • 26.
    26 BI and AnalyticsReference Architecture IT: Flexible Source Architecture Business: Flexible Tool Choice IT can now move at slower speed w/o affecting business Business can now make faster & more sophisticated decisions as all data accessible by any tool of choice Cloud DW (Snowflake , etc) Data-as-a- Service ITSemantic–where stored&processd BusSemantic–how consumed&used
  • 27.
    27 Data Fabric –Use Cases Data Warehouse OffloadingIoT Integration
  • 28.
    28 Photo by ObiOnyeador on Unsplash
  • 29.
    29 Customer Case Study- Asurion • 290 million consumers • Annual revenues (FY 2016) $5.8 B • Over 17,000 employees • 49 Offices, 18 Countries • Insurance & Warranties on digital devices BUSINESS NEED • Reduce time to create new services and products from months to weeks. • Meet strict restrictions on migrating data out of countries of origin. • Centralize companywide security management around a single point of control. THE CHALLENGE: Expand their data architecture to cope with global growth, while exceeding the expectations of the customers.
  • 30.
    30 Asurion – DigitalTransformation SOLUTION: • Asurion developed a hybrid data layer across the cloud & on-premise data. • A single point of access to the data ensuring security compliance. • Removed complexities of data access from the consumers, enabling better integration & improved analtyics
  • 31.
    32 The Architecture Sources 2. Combine Combine, Transform & Semantics 3.Consume 1. Connect Consuming Applications 4.Dev/Ops
  • 32.
    33 Current Requirements inData Management 1. Faster & more accurate decision making ▪ Data Virtualization – Single platform for all enterprise data 2. Regulations, enterprise-wide governance & data security ▪ Data Virtualization – Unified metadata management for governance and security 3. IT cost reduction ▪ Data Virtualization – Minimise data management infrastructure
  • 33.
    Data Virtualization: 1. Enablesmulti-use data lake reducing costs & increasing collaboration 2. Unifies disparate data sources in real-time 3. Supports self-service & data discovery 4. Centralises governance & security of enterprise data assets KEY TAKEAWAYS
  • 34.
    35 Next Steps Access DenodoPlatform in the Cloud! Take a Test Drive today! https://www.denodo.com/TestDrive G E T S TA R T E D TO DAY
  • 35.
    36 Denodo’s 2020 GlobalCloud Survey Webinar
  • 36.
    37 Useful Links • DataVirtualization for Dummies - Learn how to put data virtualization to work in your organisation: Integrate all data source, deliver big data solutions that work, take the pain out of cloud adoption and drive digital transformation. • Data Virtualization: The Modern Data Integration Solution - Data virtualization is a modern data integration approach that is already meeting today’s data integration challenges, providing the foundation for data integration in the future. Download this whitepaper to learn more about: The fundamental challenge for organizations today, why traditional solutions fall short and why data virtualization is the core solution.
  • 37.
    38 Denodo The Leader inData Virtualization DENODO OFFICES, CUSTOMERS, PARTNERS Palo Alto, CA. Global presence throughout North America, EMEA, APAC, and Latin America. LEADERSHIP ▪ Longest continuous focus on data virtualization – since 1999 ▪ Leader in 2018 Forrester Wave – Big Data Fabric ▪ Winner of numerous awards CUSTOMERS ~800 customers, including many F500 and G2000 companies across every major industry have gained significant business agility and ROI. FINANCIALS Backed by $4B+ private equity firm. 50+% annual growth; Profitable.
  • 38.
    Thanks! www.denodo.com info@denodo.com © CopyrightDenodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.