SlideShare a Scribd company logo
1 of 21
Download to read offline
Unraveling the Data Lake.
MPP integration within a
Logical Data Fabric
Antonio Tortosa
Technical Consultant | Denodo
AGENDA
1. The challenge of cloud object storage
2. Incorporating Massive Parallel Processing engines into a logical data
fabric
3. Denodo Platform and Presto
The challenge of cloud
object storage
4
The simplified version of object storage
The challenge of cloud object storage
Source: Amazon S3. How it works
■ Cheap storage for backup, old or rarely used data
■ Ingest 3rd party data
■ Move non-critical workloads to cheaper systems
■ Data science playground
5
The reality of enterprise data strategy
The challenge of cloud object storage
Data Lake / Object Storage
Enterprise Data
Warehouse
Business Intelligence
Reporting
Data Discovery
Other Apps
On-prem
data
CDC
ETL
6
The missing pieces
The challenge of cloud object storage
Processing - An engine capable of effectively processing the data stored
However, an MPP engine alone is not enough, as seen by the failures of previous
incarnations of Data Lake projects
Integration - A logical model serving a common canonical view of the data ecosystem
Data in the object storage is just a portion of the data in the organization. All data
should be managed with consistency, regardless of location
Data Management - Fine grained Security & Data Governance
Ease of data discovery. Documentation, classification and search capabilities.
Fine-grained security and access control
Incorporating MPP into a
logical data fabric
8
Parallel Processing of object storage data
Incorporating MPP into a logical data fabric
Logical Layer MPP Coordinator
MPP Worker
MPP Worker
MPP Worker
MPP Worker
Object
Storage
Data query
Data flow
Other calls
9
Integration of object storage with the data ecosystem
Incorporating MPP into a logical data fabric
Logical Layer MPP Coordinator
Other Sources
MPP Worker
MPP Worker
MPP Worker
MPP Worker
Object
Storage
Data query
Data flow
Other calls
Denodo Platform and Presto
11
Execution
At execution time Denodo sends the
query to Presto, now having objects
storage files natively mapped.
In addition, if other Denodo data
sources have to be used Presto uses
its Denodo connector to pull that
data into the worker nodes memory
in real-time.
Introspection
Denodo can connect to the object
storage, for example S3 buckets, and
graphically browse the folders and
files.
Parquet files, folders with content,
and partitions are automatically
detected and the developers choose
the ones that will become Denodo
base views.
Mapping
Denodo connects to Presto and
creates the necessary structures to
map the object storage files in the
target schema. Denodo automatically
detects field data types and
partitions.
Denodo then creates base views
from these tables.
The process at a glance
Denodo Platform and Presto
12
Introspection of Object Storage
Denodo Platform and Presto
MPP Worker
MPP Worker
MPP Worker
Object
Storage
■ The MPP Workers need to have the object storage files mapped
internally as tables.
○ This is typically done manually by data engineers and need different
tools to navigate the object storage and to create the tables in the
MPP engine.
■ Denodo simplifies this process by providing a unified point of view.
○ The same tool that allows introspection into the object storage
manages the mapping of the files to tables.
13
Introspection of Object Storage
Denodo Platform and Presto
14
Mapping of Object Storage files into views
Denodo Platform and Presto
15
Execution - Presto with other sources
Denodo Platform and Presto
Logical Layer MPP Coordinator
Other Sources
MPP Worker
MPP Worker
MPP Worker
MPP Worker
Object
Storage
SQL
query
Data flow
Other
calls
16
Execution - Presto with other sources
Denodo Platform and Presto
90,859 rows
2,880,404 rows
17
Execution - Presto with other sources
Denodo Platform and Presto
Fully delegated query
to Presto
■ customer_crm is brought into memory
in real time to the worker nodes so its
locally referenced as
tmp_231_885_0_2549
■ store_sales was previously mapped
by Denodo Platform and is referenced in
this query as
vdp_table_167509671357
18
Enterprise Data Architecture
Final Solution
Data Lake / Object Storage
Enterprise Data
Warehouse
Business Intelligence
Reporting
Other Apps
On-prem
data
CDC
ETL
Denodo
Virtual DataPort
Denodo
Web Services
Denodo
MPP
Denodo
Data Catalog
CLOSING
REMARKS
▪ There is a renewed interest in Data Lakes thanks to cloud object storage solving some of the
original drawbacks
▪ However, they are not a single solution to a realistic enterprise data strategy. In addition to
this we must consider as well
○ Cost-effective processing
○ Integration with other sources within the enterprise data ecosystem
○ Data governance and data security
▪ Denodo Platform has always provided these. Yet, in its 2023Q1 update, the state-of-the-art
integration with Presto provides a even better solution to the integration of Data Lakes into a
logical data fabric.
▪ Now, Denodo customers can, from a unified access layer,
○ Introspect object storage files
○ Integrate object storage data with other corporate sources to increase data adoption
▪ With this, Denodo Platform will seamlessly colocate the data in the Presto MPP cluster to
accelerate query execution.
Q&A
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.

More Related Content

Similar to Unraveling the Data Lake: MPP integration within a Logical Data Fabric

Aginity "Big Data" Research Lab
Aginity "Big Data" Research LabAginity "Big Data" Research Lab
Aginity "Big Data" Research Lab
kevinflorian
 

Similar to Unraveling the Data Lake: MPP integration within a Logical Data Fabric (20)

From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
The DURAARK Workbench and PREMIS
The DURAARK Workbench and PREMISThe DURAARK Workbench and PREMIS
The DURAARK Workbench and PREMIS
 
Aginity "Big Data" Research Lab
Aginity "Big Data" Research LabAginity "Big Data" Research Lab
Aginity "Big Data" Research Lab
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Lunch and Learn ANZ: Shaping the Role of a Data Lake in a Modern Data Fabric ...
Lunch and Learn ANZ: Shaping the Role of a Data Lake in a Modern Data Fabric ...Lunch and Learn ANZ: Shaping the Role of a Data Lake in a Modern Data Fabric ...
Lunch and Learn ANZ: Shaping the Role of a Data Lake in a Modern Data Fabric ...
 
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtree
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree
 

More from Denodo

Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
Denodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Denodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
Denodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Denodo
 

More from Denodo (20)

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 

Recently uploaded (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 

Unraveling the Data Lake: MPP integration within a Logical Data Fabric

  • 1. Unraveling the Data Lake. MPP integration within a Logical Data Fabric Antonio Tortosa Technical Consultant | Denodo
  • 2. AGENDA 1. The challenge of cloud object storage 2. Incorporating Massive Parallel Processing engines into a logical data fabric 3. Denodo Platform and Presto
  • 3. The challenge of cloud object storage
  • 4. 4 The simplified version of object storage The challenge of cloud object storage Source: Amazon S3. How it works ■ Cheap storage for backup, old or rarely used data ■ Ingest 3rd party data ■ Move non-critical workloads to cheaper systems ■ Data science playground
  • 5. 5 The reality of enterprise data strategy The challenge of cloud object storage Data Lake / Object Storage Enterprise Data Warehouse Business Intelligence Reporting Data Discovery Other Apps On-prem data CDC ETL
  • 6. 6 The missing pieces The challenge of cloud object storage Processing - An engine capable of effectively processing the data stored However, an MPP engine alone is not enough, as seen by the failures of previous incarnations of Data Lake projects Integration - A logical model serving a common canonical view of the data ecosystem Data in the object storage is just a portion of the data in the organization. All data should be managed with consistency, regardless of location Data Management - Fine grained Security & Data Governance Ease of data discovery. Documentation, classification and search capabilities. Fine-grained security and access control
  • 7. Incorporating MPP into a logical data fabric
  • 8. 8 Parallel Processing of object storage data Incorporating MPP into a logical data fabric Logical Layer MPP Coordinator MPP Worker MPP Worker MPP Worker MPP Worker Object Storage Data query Data flow Other calls
  • 9. 9 Integration of object storage with the data ecosystem Incorporating MPP into a logical data fabric Logical Layer MPP Coordinator Other Sources MPP Worker MPP Worker MPP Worker MPP Worker Object Storage Data query Data flow Other calls
  • 11. 11 Execution At execution time Denodo sends the query to Presto, now having objects storage files natively mapped. In addition, if other Denodo data sources have to be used Presto uses its Denodo connector to pull that data into the worker nodes memory in real-time. Introspection Denodo can connect to the object storage, for example S3 buckets, and graphically browse the folders and files. Parquet files, folders with content, and partitions are automatically detected and the developers choose the ones that will become Denodo base views. Mapping Denodo connects to Presto and creates the necessary structures to map the object storage files in the target schema. Denodo automatically detects field data types and partitions. Denodo then creates base views from these tables. The process at a glance Denodo Platform and Presto
  • 12. 12 Introspection of Object Storage Denodo Platform and Presto MPP Worker MPP Worker MPP Worker Object Storage ■ The MPP Workers need to have the object storage files mapped internally as tables. ○ This is typically done manually by data engineers and need different tools to navigate the object storage and to create the tables in the MPP engine. ■ Denodo simplifies this process by providing a unified point of view. ○ The same tool that allows introspection into the object storage manages the mapping of the files to tables.
  • 13. 13 Introspection of Object Storage Denodo Platform and Presto
  • 14. 14 Mapping of Object Storage files into views Denodo Platform and Presto
  • 15. 15 Execution - Presto with other sources Denodo Platform and Presto Logical Layer MPP Coordinator Other Sources MPP Worker MPP Worker MPP Worker MPP Worker Object Storage SQL query Data flow Other calls
  • 16. 16 Execution - Presto with other sources Denodo Platform and Presto 90,859 rows 2,880,404 rows
  • 17. 17 Execution - Presto with other sources Denodo Platform and Presto Fully delegated query to Presto ■ customer_crm is brought into memory in real time to the worker nodes so its locally referenced as tmp_231_885_0_2549 ■ store_sales was previously mapped by Denodo Platform and is referenced in this query as vdp_table_167509671357
  • 18. 18 Enterprise Data Architecture Final Solution Data Lake / Object Storage Enterprise Data Warehouse Business Intelligence Reporting Other Apps On-prem data CDC ETL Denodo Virtual DataPort Denodo Web Services Denodo MPP Denodo Data Catalog
  • 19. CLOSING REMARKS ▪ There is a renewed interest in Data Lakes thanks to cloud object storage solving some of the original drawbacks ▪ However, they are not a single solution to a realistic enterprise data strategy. In addition to this we must consider as well ○ Cost-effective processing ○ Integration with other sources within the enterprise data ecosystem ○ Data governance and data security ▪ Denodo Platform has always provided these. Yet, in its 2023Q1 update, the state-of-the-art integration with Presto provides a even better solution to the integration of Data Lakes into a logical data fabric. ▪ Now, Denodo customers can, from a unified access layer, ○ Introspect object storage files ○ Integrate object storage data with other corporate sources to increase data adoption ▪ With this, Denodo Platform will seamlessly colocate the data in the Presto MPP cluster to accelerate query execution.
  • 20. Q&A
  • 21. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.