SlideShare a Scribd company logo
1 of 22
Download to read offline
Architecture and Performance
Considerations in the Logical Data Lake
Dr. Alberto Pan, Chief Technical Officer
Architecture and Performance
Considerations in the Logical Data Lake
Dr. Alberto Pan, Chief Technical Officer
Agenda1. Data Lake Architecture
2.Data Virtualization in the Logical Data Lake
3.Performance: ‘Move Processing To the Data’
4.Performance: Choosing the Best Execution Plan
5.Example Scenario: The Numbers
Data Lake Architecture
5
Architecture of the Data Lake
Real-Time
Decision
Management
Alerts
Scorecards
Dashboards
Reporting
Data Discovery
Self-Service
Search
Predictive
Analytics
Statistical
Analytics (R)
Text Analytics
Data MiningData Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content
(Unstructured)
Big
Data
Enterprise
Applications
Traditional
Enterprise
Data
Cloud
Cloud
Applications
Metadata Management, Data Governance, Data Security
NoSQL
EDW
In-Memory
(SAP Hana, …)
Analytical
Appliances
Cloud DW
(Redshift,..)
ODS
Big Data
E
T
L
C
D
C
S
q
o
o
p
(Flume, Kafka, …)
Real-Time Data Access (On-Demand / Streaming)
Batch
YARN / Workload Management
HDFS
Hive
Spark
Drill
Impala
Storm HBase Solr
Hunk
DW Streams NoSQL SearchSQL
Hadoop
Tez
Map
Red.
6
How can I combine Data from Several Systems ensuring good
Performance ?
How can I abstract consuming applications from technology change
and requirements evolution ?
How can I enforce consistent Security and Governance Policies
across the Data Lake ?
Questions for the Logical Data Lake:
The Logical Data Lake Architecture
Integrated View of a Plurality of systems: Hadoop, EDW, Streaming, In-memory,...
DV in the Logical Data Lake
8
Architecture of the Data Lake
Real-Time
Decision
Management
Alerts
Scorecards
Dashboards
Reporting
Data Discovery
Self-Service
Search
Predictive
Analytics
Statistical
Analytics (R)
Text Analytics
Data MiningData Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content
(Unstructured)
Big
Data
Enterprise
Applications
Traditional
Enterprise
Data
Cloud
Cloud
Applications
Metadata Management, Data Governance, Data Security
NoSQL
EDW
In-Memory
(SAP Hana, …)
Analytical
Appliances
Cloud DW
(Redshift,..)
ODS
Big Data
E
T
L
C
D
C
S
q
o
o
p
(Flume, Kafka, …)
Real-Time Data Access (On-Demand / Streaming)
Batch
YARN / Workload Management
HDFS
Hive
Spark
Drill
Impala
Storm HBase Solr
Hunk
DW Streams NoSQL SearchSQL
Hadoop
Tez
Map
Red.
9
Architecture of the Logical Data Lake
Real-Time
Decision
Management
Alerts
Scorecards
Dashboards
Reporting
Data Discovery
Self-Service
Search
Predictive
Analytics
Statistical
Analytics (R)
Text Analytics
Data Mining
Data Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content
(Unstructured)
Big
Data
Enterprise
Applications
Traditional
Enterprise
Data
Cloud
Cloud
Applications
NoSQL
EDW
In-Memory
(SAP Hana, …)
Analytical
Appliances
Cloud DW
(Redshift,..)
ODS
Big Data
E
T
L
C
D
C
S
q
o
o
p
(Flume, Kafka, …)
Data Virtualization
Real-Time Data Access (On-Demand / Streaming)
Data Caching
DataServices
Data Search & Discovery
Governance
Security
Optimization
DataAbstraction
DataTransformation
DataFederation
Batch
YARN / Workload Management
HDFS
Hive
Spark
Drill
Impala
Storm HBase Solr
Hunk
DW Streams NoSQL SearchSQL
Hadoop
Tez
Map
Red.
10
What is Needed ?
Requirements for the Integration Component in the Logical Data Lake
Ability to answer ad-hoc queries combining data from several
systems
Performance comparable to physical approaches
Ability to expose different logical views over the same data
Single entry point to apply Security and Governance policies.
Comprehensive, granular security support
Denodo Data Virtualization is the only option verifying:
Performance: Move
Processing to the Data
12
Move Processing to the Data
Process the data where it resides
Process the data locally where
it resides
DV System combines partial
results
Minimizes network traffic
Leverages specialized data
sources
13
Move Processing to the Data: Example 1
Obtain Total Sales By Product (Naive Strategy)
Naive Strategy:
350M rows moved through the network
14
Move Processing to the Data: Example 1
Obtain Total Sales By Product (Move Processing to the Data)
Denodo Strategy:
30k rows moved through the network
15
Move Processing to the Data: Example 2
Maximum Sales Discount By Product in the last year: On-the-fly Data Movement
Move Products Data to a Temp table in the DW :
20K rows moved through the network + 10K
rows inserted in the DW
Execute full query on the DW:
10k rows through the network
16
Move Processing to the Data: Example 2
Maximum Sales Discount By Product in the last year: Partial aggregation Pushdown
Products DB:
10K rows through the network
Data Warehouse:
#rows through the network = 10K * average
#sale_prices_per_product
Performance: Choosing the
Best Execution Plan
18
How to Choose the Best Execution Plan?
Cost-Based Optimization in Data Virtualization
Data statistics to estimate size of intermediate result sets
Data Source Indexes (and other physical structures)
Execution Model of data sources: e.g. Parallel Databases VS
Hadoop clusters VS Relational Databases
Features of data sources (e.g. number of processing cores in
parallel database or Hadoop Cluster)
Data Transfer rate
Must take into account:
Example Scenario: The
Numbers
20
Example Scenario: The Numbers
Best Performance Even When Processing Billions of Rows
Performance Comparison of
Physical vs Logical
Scenario
Big Data volumes
TPC-DS benchmark
Sales
(Netezza)
Customers
(Oracle) Items
(SQLServer)
290M
2M 400K
21
Example Scenario: The Numbers
Physical vs Logical DW Performance
Query Description Rows Returned
AVG Time Physical (all
data in Netezza)
AVG Time Logical
Optimization
Technique
(automatically
chosen by Denodo
6.0)
Total sales by customer 1,99 M 20975 ms 21457 ms
Full group by
pushdown
Total sales by customer and year
between 2000 and 2004 5,51 M 52313 ms 59060 ms
Full group by
pushdown
Total sales by item brand 31,35 K 4697 ms 5330 ms
Partial group by
pushdown
Total sales by item where sale
price less than current list price 17,05 K 3509 ms 5229 ms
On the fly data
movement
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical,
including photocopying and microfilm, without prior the written authorization from Denodo Technologies.
Find more details at: datavirtualization.blog
http://www.datavirtualizationblog.com/myths-in-data-
virtualization-performance/

More Related Content

What's hot

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 

What's hot (20)

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 

Viewers also liked

Mining data, big data, cloud computing.
Mining data, big data, cloud computing.Mining data, big data, cloud computing.
Mining data, big data, cloud computing.Carlos Bairon
 
CUADRO COMPARATIVO DE CONCEPTOS DE MINING DATA, BIG DATA Y CLOUD COMPUTING.
CUADRO COMPARATIVO DE CONCEPTOS DE   MINING DATA, BIG DATA Y CLOUD COMPUTING.CUADRO COMPARATIVO DE CONCEPTOS DE   MINING DATA, BIG DATA Y CLOUD COMPUTING.
CUADRO COMPARATIVO DE CONCEPTOS DE MINING DATA, BIG DATA Y CLOUD COMPUTING.Luiseduardo123
 
Factual presentation for pg west 2010
Factual presentation for pg west 2010Factual presentation for pg west 2010
Factual presentation for pg west 2010ericlui
 
Big Data - Outcomes Performance Measured
Big Data - Outcomes Performance MeasuredBig Data - Outcomes Performance Measured
Big Data - Outcomes Performance MeasuredGreenway Health
 
TUW-ASE Summer 2015: Data marketplaces: core models and concepts
TUW-ASE Summer 2015: Data marketplaces:  core models and conceptsTUW-ASE Summer 2015: Data marketplaces:  core models and concepts
TUW-ASE Summer 2015: Data marketplaces: core models and conceptsHong-Linh Truong
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS CloudIdan Tohami
 
Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)MapR Technologies
 
Curso de big data
Curso de big data Curso de big data
Curso de big data Luis Joyanes
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingSaliya Ekanayake
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo
 
Handling Big Data in Ship Performance & Navigation Monitoring.
Handling Big Data in Ship Performance & Navigation Monitoring.Handling Big Data in Ship Performance & Navigation Monitoring.
Handling Big Data in Ship Performance & Navigation Monitoring.Lokukaluge Prasad Perera
 
Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016Christoph Wurm
 

Viewers also liked (20)

Mining data, big data, cloud computing.
Mining data, big data, cloud computing.Mining data, big data, cloud computing.
Mining data, big data, cloud computing.
 
CUADRO COMPARATIVO DE CONCEPTOS DE MINING DATA, BIG DATA Y CLOUD COMPUTING.
CUADRO COMPARATIVO DE CONCEPTOS DE   MINING DATA, BIG DATA Y CLOUD COMPUTING.CUADRO COMPARATIVO DE CONCEPTOS DE   MINING DATA, BIG DATA Y CLOUD COMPUTING.
CUADRO COMPARATIVO DE CONCEPTOS DE MINING DATA, BIG DATA Y CLOUD COMPUTING.
 
Factual presentation for pg west 2010
Factual presentation for pg west 2010Factual presentation for pg west 2010
Factual presentation for pg west 2010
 
Big Data - Outcomes Performance Measured
Big Data - Outcomes Performance MeasuredBig Data - Outcomes Performance Measured
Big Data - Outcomes Performance Measured
 
TUW-ASE Summer 2015: Data marketplaces: core models and concepts
TUW-ASE Summer 2015: Data marketplaces:  core models and conceptsTUW-ASE Summer 2015: Data marketplaces:  core models and concepts
TUW-ASE Summer 2015: Data marketplaces: core models and concepts
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Construindo um Data Lake na AWS
Construindo um Data Lake na AWSConstruindo um Data Lake na AWS
Construindo um Data Lake na AWS
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Curso de big data
Curso de big data Curso de big data
Curso de big data
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and Benchmarking
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
 
Handling Big Data in Ship Performance & Navigation Monitoring.
Handling Big Data in Ship Performance & Navigation Monitoring.Handling Big Data in Ship Performance & Navigation Monitoring.
Handling Big Data in Ship Performance & Navigation Monitoring.
 
Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016
 

Similar to Big Data: Architecture and Performance Considerations in Logical Data Lakes

HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Martin Bém
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierDemai Ni
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...Denodo
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 

Similar to Big Data: Architecture and Performance Considerations in Logical Data Lakes (20)

HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
BigData
BigDataBigData
BigData
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
My Master's Thesis
My Master's ThesisMy Master's Thesis
My Master's Thesis
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management Frontier
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Bigdata
BigdataBigdata
Bigdata
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 

More from Denodo

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoDenodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerDenodo
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?Denodo
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeDenodo
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Denodo
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDenodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхDenodo
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationDenodo
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Denodo
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardDenodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Denodo
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Denodo
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?Denodo
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsDenodo
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityDenodo
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesDenodo
 

More from Denodo (20)

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Big Data: Architecture and Performance Considerations in Logical Data Lakes

  • 1. Architecture and Performance Considerations in the Logical Data Lake Dr. Alberto Pan, Chief Technical Officer
  • 2. Architecture and Performance Considerations in the Logical Data Lake Dr. Alberto Pan, Chief Technical Officer
  • 3. Agenda1. Data Lake Architecture 2.Data Virtualization in the Logical Data Lake 3.Performance: ‘Move Processing To the Data’ 4.Performance: Choosing the Best Execution Plan 5.Example Scenario: The Numbers
  • 5. 5 Architecture of the Data Lake Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data MiningData Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications Metadata Management, Data Governance, Data Security NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Real-Time Data Access (On-Demand / Streaming) Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red.
  • 6. 6 How can I combine Data from Several Systems ensuring good Performance ? How can I abstract consuming applications from technology change and requirements evolution ? How can I enforce consistent Security and Governance Policies across the Data Lake ? Questions for the Logical Data Lake: The Logical Data Lake Architecture Integrated View of a Plurality of systems: Hadoop, EDW, Streaming, In-memory,...
  • 7. DV in the Logical Data Lake
  • 8. 8 Architecture of the Data Lake Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data MiningData Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications Metadata Management, Data Governance, Data Security NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Real-Time Data Access (On-Demand / Streaming) Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red.
  • 9. 9 Architecture of the Logical Data Lake Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data Mining Data Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Data Virtualization Real-Time Data Access (On-Demand / Streaming) Data Caching DataServices Data Search & Discovery Governance Security Optimization DataAbstraction DataTransformation DataFederation Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red.
  • 10. 10 What is Needed ? Requirements for the Integration Component in the Logical Data Lake Ability to answer ad-hoc queries combining data from several systems Performance comparable to physical approaches Ability to expose different logical views over the same data Single entry point to apply Security and Governance policies. Comprehensive, granular security support Denodo Data Virtualization is the only option verifying:
  • 12. 12 Move Processing to the Data Process the data where it resides Process the data locally where it resides DV System combines partial results Minimizes network traffic Leverages specialized data sources
  • 13. 13 Move Processing to the Data: Example 1 Obtain Total Sales By Product (Naive Strategy) Naive Strategy: 350M rows moved through the network
  • 14. 14 Move Processing to the Data: Example 1 Obtain Total Sales By Product (Move Processing to the Data) Denodo Strategy: 30k rows moved through the network
  • 15. 15 Move Processing to the Data: Example 2 Maximum Sales Discount By Product in the last year: On-the-fly Data Movement Move Products Data to a Temp table in the DW : 20K rows moved through the network + 10K rows inserted in the DW Execute full query on the DW: 10k rows through the network
  • 16. 16 Move Processing to the Data: Example 2 Maximum Sales Discount By Product in the last year: Partial aggregation Pushdown Products DB: 10K rows through the network Data Warehouse: #rows through the network = 10K * average #sale_prices_per_product
  • 18. 18 How to Choose the Best Execution Plan? Cost-Based Optimization in Data Virtualization Data statistics to estimate size of intermediate result sets Data Source Indexes (and other physical structures) Execution Model of data sources: e.g. Parallel Databases VS Hadoop clusters VS Relational Databases Features of data sources (e.g. number of processing cores in parallel database or Hadoop Cluster) Data Transfer rate Must take into account:
  • 20. 20 Example Scenario: The Numbers Best Performance Even When Processing Billions of Rows Performance Comparison of Physical vs Logical Scenario Big Data volumes TPC-DS benchmark Sales (Netezza) Customers (Oracle) Items (SQLServer) 290M 2M 400K
  • 21. 21 Example Scenario: The Numbers Physical vs Logical DW Performance Query Description Rows Returned AVG Time Physical (all data in Netezza) AVG Time Logical Optimization Technique (automatically chosen by Denodo 6.0) Total sales by customer 1,99 M 20975 ms 21457 ms Full group by pushdown Total sales by customer and year between 2000 and 2004 5,51 M 52313 ms 59060 ms Full group by pushdown Total sales by item brand 31,35 K 4697 ms 5330 ms Partial group by pushdown Total sales by item where sale price less than current list price 17,05 K 3509 ms 5229 ms On the fly data movement
  • 22. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies. Find more details at: datavirtualization.blog http://www.datavirtualizationblog.com/myths-in-data- virtualization-performance/