SlideShare a Scribd company logo
1 of 35
Download to read offline
DATA VIRTUALIZATION
Packed Lunch Webinar Series
Sessions Covering Key Data Integration
Challenges Solved with Data Virtualization
Data Lakes: A Logical Approach for
Faster Unified Insights
Robin Tandon
Product Marketing
Director | Denodo
Chris Walters
Senior Solutions
Consultant | Denodo
Agenda
1. What is a data lake?
2. Why do they exist ?
3. Some of the challenges of data lakes
4. The benefits of a logical approach to data lakes
5. Customer case study
6. Demo
7. Conclusion
8. Q & A
4
A Brief History
Data Lake
5
Etymology of “data lake”
https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/
Pentaho’s CTO James Dixon is credited with coining
the term "data lake". He described it in his blog in
2010:
"If you think of a data mart as a store of bottled
water – cleansed and packaged and structured
for easy consumption – the data lake is a large
body of water in a more natural state. The
contents of the data lake stream in from a
source to fill the lake, and various users of the
lake can come to examine, dive in, or take
samples."
6
Data lakes were born to efficiently
address the challenge of cost reduction:
data lakes allow for cheap, efficient
storage of very large amounts of data
Cloud implementation simplified the
complexity of managing a large data
lake
7
The Data Lake – Architecture I
Distributed File System
Cheap storage for large data volumes
• Support for multiple file formats (Parquet, CSV,
JSON, etc)
• Examples:
• On-prem: HDFS
• Cloud native: AWS S3, Azure ADLS, Google GCS
8
The Data Lake – Architecture II
Distributed File System
Execution Engine
Massively parallel & scalable execution engine
• Cheaper execution than traditional EDW
architectures
• Decoupled from storage
• Doesn’t require specialized HW
• Examples:
• SQL-on-Hadoop engines: Spark, Hive, Impala, Drill,
Dremio, Presto, etc.
• Cloud native: AWS Redshift, Snowflake, AWS Athena,
Delta Lake, GCP BigQuery
9
The Data Lake – Architecture III
Adoption of new transformation techniques
• Data ingested is normally raw and unusable by end
users
• Data is transformed and moved to different
“zones” with different levels of curation
• End users only access the refined zone
• Use of ELT as a cheaper transformation technique
than ETL
• Use of the engine and storage of the lake for data
transformation instead of external ETL flows
• Removes the need for additional staging HW
Raw zone Trusted zone Refined Zone
Distributed File System
Execution Engine
10
Data Lake Example –AWS
• Data ingested using AWS Glue (or other ETL tools)
• Raw data stored in S3 object store
• Maintain fidelity and structure of data
• Metadata extracted/enriched using Glue Data Catalog
• Business rules/DQ rules applied to S3 data as copied to
Trusted Zone data stores
• Trusted Zone contains more than one data store – select
best data store for data and data processing
• Refined Zone contains data for consumer – curated data
sets (data marts?)
• Refined Zone data stores differ – Redshift, Athena,
Snowflake, …
TRUSTED ZONE
RAW ZONE
S3 for raw data
INGESTION
Data Sources
Internal
&
external
AWS Glue
Consumers
Data Portals
BI –Visualization
Analytic
Workbench
Mobile Apps
Etc.
REFINED ZONE
11
Hadoop-Based Data Lakes – A Data Scientist’s Playground
The early data scientists saw Hadoop as their
personal supercomputer.
Hadoop-based Data Lakes helped democratize
access to state-of-the-art supercomputing with
off-the-shelf HW (and later cloud)
The industry push for BI made Hadoop–based
solutions the standard to bring modern
analytics to any corporation
Hadoop-based Data Lakes became
“data science silos”
Can data lakes also address the
other data management
challenges?
Can they provide fast decision
making with proper
governance and security?
13
Changing the Data Lake Goals
“The popular view is that a
data lake will be the one
destination for all the data
in their enterprise and the
optimal platform for all
their analytics.”
Nick Heudecker, Gartner
14
Rick Van der Lans, R20 Consultancy
Multi‐purpose data lakes are data delivery environments developed to support a
broad range of users, from traditional self‐service BI users (e.g. finance, marketing,
human resource, transport) to sophisticated data scientists.
Multi‐purpose data lakes allow a broader and deeper use of the data lake
investment without minimizing the potential value for data science and without
making it an inflexible environment.
15
The Data Lake as the Repository of All Data
• Huge up-front investment: creating ingestion pipelines for all company datasets into the
lake is costly
• Questionable ROI as a lot of that data may never be used
• Replicate the EDW? Replace it entirely?
• Large recurrent maintenance costs: those pipelines need to be constantly modified as
data structures change in the sources
• Risk of inconsistencies: data needs to be frequently synchronized to avoid stale datasets
• Loss of capabilities: data lake capabilities may differ from those of original sources, e.g.
quick access by ID in operational RDBMS
Efficient use of the data lake to accelerate insights comes at the cost of price,
time-to-market and governance
COST
GOVERNANCE
To efficiently enable self-service initiatives, a data lake must provide access to all company data.
Is that realistic? And even if possible, it comes with multiple trade-offs:
16
Purpose-specific data lakes
• Higher complexity: end users need to find where data is and how to use it
• Risk of Inconsistencies: data may be in multiple places, in different formats
and calculated at different times
• Loss of security: frustrations increase the use of shadow IT, “personal”
extracts, uncontrolled data prep flows, etc.
An environment with multiple purpose-specific systems slows down TTM and
jeopardizes security and governance
TTM
SECURITY
If we restrict the use of the data lake to a specific use case (e.g. data science), some of those
problems go away.
However, to maintain the capabilities for fast insights and self-service, we add an additional
burden to the end user:
17
Data Lakes in the ‘Pit of Despair’
Data Lakes are 2-5 years from
Plateau of Productivity and are
deep in the
Trough of Disillusionment
Gartner – Hype Cycle Data Management July 2021
18
Gartner – The Evolution of Analytical Environments
This is a Second Major Cycle of Analytical Consolidation
Operational Application
Operational Application
Operational Application
IoT Data
Other NewData
Operational
Application
Operational
Application
Cube
Operational
Application
Cube
? Operational Application
Operational Application
Operational Application
IoT Data
Other NewData
1980s
Pre EDW
1990s
EDW
2010s
2000s
Post EDW
Time
LDW
Operational
Application
Operational
Application
Operational
Application
Data
Warehouse
Data
Warehouse
Data
Lake
?
LDW
Data Warehouse
Data Lake
Marts
ODS
Staging/Ingest
Unified analysis
› Consolidated data
› "Collect the data"
› Single server, multiple nodes
› More analysis than any
one server can provide
©2018 Gartner, Inc.
Unified analysis
› Logically consolidated view of all data
› "Connect and collect"
› Multiple servers, of multiple nodes
› More analysis than any one system can provide
ID: 342254
Fragmented/
nonexistent analysis
› Multiple sources
› Multiple structured sources
Fragmented analysis
› "Collect the data" (Into
› different repositories)
› New data types,
› processing, requirements
› Uncoordinated views
“Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs”. Henry Cook, Gartner April 2018
19
Gartner – Logical Data Warehouse
“Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs”. Henry Cook, Gartner April 2018
DATA VIRTUALIZATION
20
…Data lakes lack semantic consistency and governed
metadata. Meeting the needs of wider audiences require
curated repositories with governance, semantic
consistency and access controls.”
How can a logical data
fabric approach help?
22
Faster Time to Market for data projects
A data virtualization layer allows you to connect directly to all kinds of data sources: the EDW,
application databases, SaaS applications, etc.
This means that not all data needs to be replicated to the data lake for consumers to access it
from a single (virtual) repository.
In some cases, it makes sense to replicate in the lake, for others it doesn’t. DV opens that door
▪ Data can be accessed immediately, easily improving TTM and ROI of the lake
▪ If data is not useful, time was not lost preparing pipelines and copying data
▪ Can ingest and synchronize data into the lake efficiently when needed
▪ Denodo can load and update data into the data lake natively, using Parquet, and parallel loads
▪ Execution is pushed down to original sources, taking advantage of their capabilities
▪ Especially significant in the case of EDW with strong processing capabilities
TTM
COST
23
Easier self-service through a single delivery layer
From an end user perspective, access to all data is done through a single layer, in
change of delivery of any data, regardless of its actual physical location
A single delivery layer also allows you to enforce security and governance policies
The virtual layer becomes the “delivery zone” of the data lake, offering modeling and
caching capabilities, documentation and output in multiple formats
GOVERNANCE
• Built-in rich modeling capabilities to tailor data models to end
users
• Integrated catalog, search and documentation capabilities
• Access via SQL, REST, OData and GraphQL with no additional
coding
• Advanced security controls, SSO, workload management,
monitoring, etc.
24
Accelerates query execution
Controlling data delivery separately from storage allows a virtual layer to accelerate
query execution, providing faster response than the sources alone
▪ Aggregate-aware capabilities to accelerate execution of
analytical queries
▪ Flexible caching options to materialize frequently used data:
▪ Full datasets
▪ Partial results
▪ Hybrid (cached content + updates from source in real time)
▪ Powerful optimization capabilities for multi-source federated
queries
PERFORMANCE
25
Denodo’s Logical Data Lake
ETL
Data Warehouse
Kafka
Physical Data
Lake
Logical Data Lake
Files
ETL
Data Warehouse
Kafka
Physical Data
Lake
Files
IT Storage and Processing
BI & Reporting
Mobile
Applications
Predictive Analytics
AI/ML
Real time dashboards
Consuming Tools
Query
Engine
Business
Delivery
Source
Abstraction
Business Catalog
Security and Governance
Raw zone Trusted zone Refined Zone
Distributed File System
Execution Engine
Delivery Zone
Case Study
Business Need Solution Benefits
27
Leading Construction Manufacturer Improves Service
Delivery and Revenue
Case Study
In business for over 90 years and is the world’s leading manufacturer of construction
and mining equipment, diesel and natural gas engines, industrial gas turbines and
diesel-electric locomotive.
▪ Competitive pressure from low-cost Chinese
manufacturers
▪ Needed a proactive approach to customer
service to differentiate
▪ Sought to improve equipment and services
delivery through predictive maintenance
▪ Telemetry (IoT) data from sensors embedded in
the equipment is stored in Hadoop to perform
predictive analytics
▪ Denodo integrates analytics data with parts,
maintenance, and dealer information stored in
traditional systems
▪ It then feeds the predictive maintenance
information to a customer dashboard
▪ Phased rollout systematically improved asset
performance and proactive maintenance
▪ Increased revenue from sale of services and
parts
▪ Reduced warranty costs of parts failure
▪ Future – optimize pricing for services and parts
among global service providers
27
28
Architectural Diagrams
Product Demonstration
29
Chris Walters
Sr. Solutions Consultant
30
Demo Scenario
Use Case
• Integrate data from 3 disparate sources
to determine the impact of a new
marketing campaign on total sales in
each country in which they do business.
Data Sources
▪ Historical sales data stored in data lake
▪ Marketing campaigns managed in an
external cloud app
▪ Customer data stored in the EDW
Sources
Combine,
Transform
&
Integrate
Consume
Base View
Source
Abstraction
Sales Campaign Customer
Sales Evolution
31
Demo
1. In most cases, not all the data is going to be in the
data lake
2. Large data lake projects are complex environments
that will benefit from a virtual ‘consumption’ layer
3. Data virtualization provides a governance and
management infrastructure required for successful
data lake implementation
4. Data Virtualization is more than just a data access
or services layer, it is a key component for a Data
Lake
Key Takeaways
Q&A
34
Next Steps
Try Denodo with the 30-day Free Trial
Whitepaper: Logical Data Fabric to the
Rescue: Integrating Data Warehouses, Data
Lakes, and Data Hubs
By Rick van der Lans
G E T STA RT E D TO DAY
denodo.com/en/denodo-platform/free-trials
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

More Related Content

Similar to Data Lakes: A Logical Approach for Faster Unified Insights

Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationDenodo
 
Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?Denodo
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Denodo
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lakejeetendra mandal
 
An Overview of Data Lake
An Overview of Data LakeAn Overview of Data Lake
An Overview of Data LakeIRJET Journal
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Denodo
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Denodo
 

Similar to Data Lakes: A Logical Approach for Faster Unified Insights (20)

Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
 
Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lake
 
An Overview of Data Lake
An Overview of Data LakeAn Overview of Data Lake
An Overview of Data Lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
 

More from Denodo

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoDenodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerDenodo
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?Denodo
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeDenodo
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Denodo
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDenodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхDenodo
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationDenodo
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Denodo
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardDenodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Denodo
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Denodo
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?Denodo
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsDenodo
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityDenodo
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesDenodo
 

More from Denodo (20)

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
 

Recently uploaded

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Recently uploaded (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 

Data Lakes: A Logical Approach for Faster Unified Insights

  • 1. DATA VIRTUALIZATION Packed Lunch Webinar Series Sessions Covering Key Data Integration Challenges Solved with Data Virtualization
  • 2. Data Lakes: A Logical Approach for Faster Unified Insights Robin Tandon Product Marketing Director | Denodo Chris Walters Senior Solutions Consultant | Denodo
  • 3. Agenda 1. What is a data lake? 2. Why do they exist ? 3. Some of the challenges of data lakes 4. The benefits of a logical approach to data lakes 5. Customer case study 6. Demo 7. Conclusion 8. Q & A
  • 5. 5 Etymology of “data lake” https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/ Pentaho’s CTO James Dixon is credited with coining the term "data lake". He described it in his blog in 2010: "If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples."
  • 6. 6 Data lakes were born to efficiently address the challenge of cost reduction: data lakes allow for cheap, efficient storage of very large amounts of data Cloud implementation simplified the complexity of managing a large data lake
  • 7. 7 The Data Lake – Architecture I Distributed File System Cheap storage for large data volumes • Support for multiple file formats (Parquet, CSV, JSON, etc) • Examples: • On-prem: HDFS • Cloud native: AWS S3, Azure ADLS, Google GCS
  • 8. 8 The Data Lake – Architecture II Distributed File System Execution Engine Massively parallel & scalable execution engine • Cheaper execution than traditional EDW architectures • Decoupled from storage • Doesn’t require specialized HW • Examples: • SQL-on-Hadoop engines: Spark, Hive, Impala, Drill, Dremio, Presto, etc. • Cloud native: AWS Redshift, Snowflake, AWS Athena, Delta Lake, GCP BigQuery
  • 9. 9 The Data Lake – Architecture III Adoption of new transformation techniques • Data ingested is normally raw and unusable by end users • Data is transformed and moved to different “zones” with different levels of curation • End users only access the refined zone • Use of ELT as a cheaper transformation technique than ETL • Use of the engine and storage of the lake for data transformation instead of external ETL flows • Removes the need for additional staging HW Raw zone Trusted zone Refined Zone Distributed File System Execution Engine
  • 10. 10 Data Lake Example –AWS • Data ingested using AWS Glue (or other ETL tools) • Raw data stored in S3 object store • Maintain fidelity and structure of data • Metadata extracted/enriched using Glue Data Catalog • Business rules/DQ rules applied to S3 data as copied to Trusted Zone data stores • Trusted Zone contains more than one data store – select best data store for data and data processing • Refined Zone contains data for consumer – curated data sets (data marts?) • Refined Zone data stores differ – Redshift, Athena, Snowflake, … TRUSTED ZONE RAW ZONE S3 for raw data INGESTION Data Sources Internal & external AWS Glue Consumers Data Portals BI –Visualization Analytic Workbench Mobile Apps Etc. REFINED ZONE
  • 11. 11 Hadoop-Based Data Lakes – A Data Scientist’s Playground The early data scientists saw Hadoop as their personal supercomputer. Hadoop-based Data Lakes helped democratize access to state-of-the-art supercomputing with off-the-shelf HW (and later cloud) The industry push for BI made Hadoop–based solutions the standard to bring modern analytics to any corporation Hadoop-based Data Lakes became “data science silos”
  • 12. Can data lakes also address the other data management challenges? Can they provide fast decision making with proper governance and security?
  • 13. 13 Changing the Data Lake Goals “The popular view is that a data lake will be the one destination for all the data in their enterprise and the optimal platform for all their analytics.” Nick Heudecker, Gartner
  • 14. 14 Rick Van der Lans, R20 Consultancy Multi‐purpose data lakes are data delivery environments developed to support a broad range of users, from traditional self‐service BI users (e.g. finance, marketing, human resource, transport) to sophisticated data scientists. Multi‐purpose data lakes allow a broader and deeper use of the data lake investment without minimizing the potential value for data science and without making it an inflexible environment.
  • 15. 15 The Data Lake as the Repository of All Data • Huge up-front investment: creating ingestion pipelines for all company datasets into the lake is costly • Questionable ROI as a lot of that data may never be used • Replicate the EDW? Replace it entirely? • Large recurrent maintenance costs: those pipelines need to be constantly modified as data structures change in the sources • Risk of inconsistencies: data needs to be frequently synchronized to avoid stale datasets • Loss of capabilities: data lake capabilities may differ from those of original sources, e.g. quick access by ID in operational RDBMS Efficient use of the data lake to accelerate insights comes at the cost of price, time-to-market and governance COST GOVERNANCE To efficiently enable self-service initiatives, a data lake must provide access to all company data. Is that realistic? And even if possible, it comes with multiple trade-offs:
  • 16. 16 Purpose-specific data lakes • Higher complexity: end users need to find where data is and how to use it • Risk of Inconsistencies: data may be in multiple places, in different formats and calculated at different times • Loss of security: frustrations increase the use of shadow IT, “personal” extracts, uncontrolled data prep flows, etc. An environment with multiple purpose-specific systems slows down TTM and jeopardizes security and governance TTM SECURITY If we restrict the use of the data lake to a specific use case (e.g. data science), some of those problems go away. However, to maintain the capabilities for fast insights and self-service, we add an additional burden to the end user:
  • 17. 17 Data Lakes in the ‘Pit of Despair’ Data Lakes are 2-5 years from Plateau of Productivity and are deep in the Trough of Disillusionment Gartner – Hype Cycle Data Management July 2021
  • 18. 18 Gartner – The Evolution of Analytical Environments This is a Second Major Cycle of Analytical Consolidation Operational Application Operational Application Operational Application IoT Data Other NewData Operational Application Operational Application Cube Operational Application Cube ? Operational Application Operational Application Operational Application IoT Data Other NewData 1980s Pre EDW 1990s EDW 2010s 2000s Post EDW Time LDW Operational Application Operational Application Operational Application Data Warehouse Data Warehouse Data Lake ? LDW Data Warehouse Data Lake Marts ODS Staging/Ingest Unified analysis › Consolidated data › "Collect the data" › Single server, multiple nodes › More analysis than any one server can provide ©2018 Gartner, Inc. Unified analysis › Logically consolidated view of all data › "Connect and collect" › Multiple servers, of multiple nodes › More analysis than any one system can provide ID: 342254 Fragmented/ nonexistent analysis › Multiple sources › Multiple structured sources Fragmented analysis › "Collect the data" (Into › different repositories) › New data types, › processing, requirements › Uncoordinated views “Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs”. Henry Cook, Gartner April 2018
  • 19. 19 Gartner – Logical Data Warehouse “Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs”. Henry Cook, Gartner April 2018 DATA VIRTUALIZATION
  • 20. 20 …Data lakes lack semantic consistency and governed metadata. Meeting the needs of wider audiences require curated repositories with governance, semantic consistency and access controls.”
  • 21. How can a logical data fabric approach help?
  • 22. 22 Faster Time to Market for data projects A data virtualization layer allows you to connect directly to all kinds of data sources: the EDW, application databases, SaaS applications, etc. This means that not all data needs to be replicated to the data lake for consumers to access it from a single (virtual) repository. In some cases, it makes sense to replicate in the lake, for others it doesn’t. DV opens that door ▪ Data can be accessed immediately, easily improving TTM and ROI of the lake ▪ If data is not useful, time was not lost preparing pipelines and copying data ▪ Can ingest and synchronize data into the lake efficiently when needed ▪ Denodo can load and update data into the data lake natively, using Parquet, and parallel loads ▪ Execution is pushed down to original sources, taking advantage of their capabilities ▪ Especially significant in the case of EDW with strong processing capabilities TTM COST
  • 23. 23 Easier self-service through a single delivery layer From an end user perspective, access to all data is done through a single layer, in change of delivery of any data, regardless of its actual physical location A single delivery layer also allows you to enforce security and governance policies The virtual layer becomes the “delivery zone” of the data lake, offering modeling and caching capabilities, documentation and output in multiple formats GOVERNANCE • Built-in rich modeling capabilities to tailor data models to end users • Integrated catalog, search and documentation capabilities • Access via SQL, REST, OData and GraphQL with no additional coding • Advanced security controls, SSO, workload management, monitoring, etc.
  • 24. 24 Accelerates query execution Controlling data delivery separately from storage allows a virtual layer to accelerate query execution, providing faster response than the sources alone ▪ Aggregate-aware capabilities to accelerate execution of analytical queries ▪ Flexible caching options to materialize frequently used data: ▪ Full datasets ▪ Partial results ▪ Hybrid (cached content + updates from source in real time) ▪ Powerful optimization capabilities for multi-source federated queries PERFORMANCE
  • 25. 25 Denodo’s Logical Data Lake ETL Data Warehouse Kafka Physical Data Lake Logical Data Lake Files ETL Data Warehouse Kafka Physical Data Lake Files IT Storage and Processing BI & Reporting Mobile Applications Predictive Analytics AI/ML Real time dashboards Consuming Tools Query Engine Business Delivery Source Abstraction Business Catalog Security and Governance Raw zone Trusted zone Refined Zone Distributed File System Execution Engine Delivery Zone
  • 27. Business Need Solution Benefits 27 Leading Construction Manufacturer Improves Service Delivery and Revenue Case Study In business for over 90 years and is the world’s leading manufacturer of construction and mining equipment, diesel and natural gas engines, industrial gas turbines and diesel-electric locomotive. ▪ Competitive pressure from low-cost Chinese manufacturers ▪ Needed a proactive approach to customer service to differentiate ▪ Sought to improve equipment and services delivery through predictive maintenance ▪ Telemetry (IoT) data from sensors embedded in the equipment is stored in Hadoop to perform predictive analytics ▪ Denodo integrates analytics data with parts, maintenance, and dealer information stored in traditional systems ▪ It then feeds the predictive maintenance information to a customer dashboard ▪ Phased rollout systematically improved asset performance and proactive maintenance ▪ Increased revenue from sale of services and parts ▪ Reduced warranty costs of parts failure ▪ Future – optimize pricing for services and parts among global service providers 27
  • 30. 30 Demo Scenario Use Case • Integrate data from 3 disparate sources to determine the impact of a new marketing campaign on total sales in each country in which they do business. Data Sources ▪ Historical sales data stored in data lake ▪ Marketing campaigns managed in an external cloud app ▪ Customer data stored in the EDW Sources Combine, Transform & Integrate Consume Base View Source Abstraction Sales Campaign Customer Sales Evolution
  • 32. 1. In most cases, not all the data is going to be in the data lake 2. Large data lake projects are complex environments that will benefit from a virtual ‘consumption’ layer 3. Data virtualization provides a governance and management infrastructure required for successful data lake implementation 4. Data Virtualization is more than just a data access or services layer, it is a key component for a Data Lake Key Takeaways
  • 33. Q&A
  • 34. 34 Next Steps Try Denodo with the 30-day Free Trial Whitepaper: Logical Data Fabric to the Rescue: Integrating Data Warehouses, Data Lakes, and Data Hubs By Rick van der Lans G E T STA RT E D TO DAY denodo.com/en/denodo-platform/free-trials
  • 35. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.