SlideShare a Scribd company logo
Are You Killing
the Benefits of
Your Data Lake?
Speakers
Rick van der Lans
Independent Business
Intelligence Analyst
R20 Consultancy
Lakshmi Randall
Director of Product Marketing
Denodo
@LakshmiLJ@rick_vanderlans
Copyright © 2018 R20/Consultancy B.V., The Netherlands 3
Wikipedia: Data science is an interdisciplinary
field of scientific methods, processes,
algorithms and systems to extract
knowledge or insights from data in various
forms, either structured or unstructured,
similar to data mining.
Copyright © 2018 R20/Consultancy B.V., The Netherlands 4
Data Science Steps and Data Preparation
Defining goals
Data selection
Data understanding
Data enrichment
Data cleansing
Data coding
Creating analytical model
Analytics
Understanding results
Copyright © 2018 R20/Consultancy B.V., The Netherlands 5
Data Preparation is Time-Consuming
Source: Gill Press, “Cleaning Big Data: Most Time-Consuming,
Least Enjoyable Data Science Task, Survey Says”, March 2016
Copyright © 2018 R20/Consultancy B.V., The Netherlands 6
Common Definition of Data Lake
James Serra:
A “data lake” is a storage repository, usually in Hadoop, that holds a
vast amount of raw data in its native format until it is needed. It’s a
great place for investigating, exploring, experimenting, and refining
data, in addition to archiving data.
Source: http://www.jamesserra.com/archive/2015/04/what-is-a-data-lake/
Copyright © 2018 R20/Consultancy B.V., The Netherlands 7
The Logical Data Lake
All
data sources
Investigative
analyticsData lake
Data science
Copyright © 2018 R20/Consultancy B.V., The Netherlands 8
Challenges of a Physical Data Lake
Complex “T” moved to data usage
Big data too big to move
• Too slow to copy and bandwidth issues
Uncooperative departments - company politics
Restricting data privacy and protection regulations
Data in data lake is stored outside original security realm
Missing metadata to describe data
Some sources are hard to copy
• For example, mainframe data
Refreshing of data lake
Management of data lake required
…
Data lake
Copyright © 2018 R20/Consultancy B.V., The Netherlands 9
The Logical (Virtual) Data Lake
Data sources
ETL ETL Cached Cached
Logical Data Lake
Data science and
investigative users
Copyright © 2018 R20/Consultancy B.V., The Netherlands 10
Data is too valuable an
asset to be used for
reporting only.
Copyright © 2018 R20/Consultancy B.V., The Netherlands 11
A Multitude of Data Delivery Systems
The classic data warehouse
architecture
The data lake
The data marketplace
Data services
Managed file transfer
Data streaming
…
Copyright © 2018 R20/Consultancy B.V., The Netherlands 12
Drawback: Replicated Specifications
Data warehouse
Data lake
Data marketplace
Data streaming
Data file transfer
Data services
Copyright © 2018 R20/Consultancy B.V., The Netherlands 13
Drawback: Replicated Specifications
Source
System 1
Source
System 2
Data warehouse
Data lake
Data services
Analytics & reporting
Data science
App
=
=
Copyright © 2018 R20/Consultancy B.V., The Netherlands 14
Siloed Data Delivery Systems
Copyright © 2018 R20/Consultancy B.V., The Netherlands 15
Landing Zone
Curated Zone
Production Zone
Data sources
Business users
A Physical Data Lake With Multiple Zones
Copyright © 2018 R20/Consultancy B.V., The Netherlands 16
The Logical Data Warehouse Architecture
Enterprise data layer
Data consumption
layer
Data source
layer
DataViertualization
Copyright © 2018 R20/Consultancy B.V., The Netherlands 17
DataVirtualizationServer
Source systems
Curated zone
Production
zone
Landing zone
Data Scientists and
other Business users
The Logical, Multi-Purpose Data Lake
Copyright © 2018 R20/Consultancy B.V., The Netherlands 18
Key Features Missing in SQL-on-Hadoop Engines
Allowing applications and users to access all the data
through another interface than SQL
Allowing all types of data sources to be accessed
Detailed lineage and impact analysis capabilities
A searchable data catalog
Advanced query optimization techniques for
federated queries
Advanced query pushdown and parallel processing
capabilities
Centralized data security
Copyright © 2018 R20/Consultancy B.V., The Netherlands 19
Single-Purpose versus Multi-Purpose Data Lake (1)
The Single-Purpose Data Lake
• Not always practical or feasible
• The data in a data lake is potentially too valuable to be used by data
scientists exclusively
• Other user groups may be interested in the data lake
• Siloed data delivery system operating independently of others
• Multiple physical layers of lakes is complex
Copyright © 2018 R20/Consultancy B.V., The Netherlands 20
Single-Purpose versus Multi-Purpose Data Lake (2)
The Multi-Purpose Data Lake
• Some data is physically stored centrally (through copying or caching), and
some is accessed remotely
• The data offered can be accessed by any type of business user
• The data in the data sources can be transformed to any form that is required
by other user groups
• A logical, multi-purpose data lake can be the foundation for several data
delivery systems
• Working with logical layers is easy to manage and maintain
Copyright © 2018 R20/Consultancy B.V., The Netherlands 21
Advantages Multi-Purpose Data Lakes
Reduction of development costs
• Metadata specifications are defined once and reused many times
• Analytical solutions developed by one data scientist can easily be reused
• Data-related solutions developed by non-data scientists can be reused
Acceleration of development
• Data scientists don’t need to spend time on data selection
• Physically copying data is not mandatory, but optional
• Business user don’t have to learn the technical languages and APIs of the original data sources
Increase report and analytical consistency
• Reusing analytical and data-related solutions improve the reporting and analytical consistency
• Definitions, descriptions, tags, and categories can be centrally cataloged
• Access to all the data can be centrally secured
Copyright © 2018 R20/Consultancy B.V., The Netherlands 22
Time to Tear the Silos Down!
Data Virtualization
24
Shhh… the ugly little secret is that big data deployment is hard!
Big Data Hadoop Deployments
25
Fifty Shades of Data Management
Transactional
Systems
Data Warehouses
SAS Applications
Data Hubs
OLAP Sources
Data Catalogs
Micro Services
Data Marts
Streams/
Queues/CDC
Data Lakes
Black Hole
Copyright © Intelligent Solutions, Inc. 2018 All Rights Reserved
▪ For companies to benefit from their analytic efforts, data must be:
▪ Easily located (wherever it resides)
▪ Easily understood (with all its context in place)
▪ Easily accessed (query performance is critical)
▪ Easily audited (its lifecycle is clear to both IT and business users)
▪ Appropriately provisioned for analysis (its management is known)
26
The Analytics Environment Must Have a Brain…
27
A Few Simple Rules…
1. Build a business strategy rather than a big data strategy
2. Big data is really about small
3. Users come in all shapes and sizes
• Who are they? What data do they need? What flexibility do
they want?
4. Connect to all of the data (but start with the most important)
• What data is needed by the users? Open access or pre-
aggregated or pre-calculated?
5. Use the language that the business understands
• Don’t force people to change terminology…support multiple
models, e.g., to Finance it’s an ‘account’, to Customer Care
it’s a ‘customer’.
27
28
Self-Service With Guardrails
• Don’t build just for the ‘data cowboys’
• Create pre-integrated, pre-calculated data
• Eliminating this burden from the users.
• Ensures consistency of calculations, etc.
• But allow the cowboys to ‘roam and wrangle’
• Even the cowboys can only access ‘approved’ data
sources
29
A Single, Logical, Multi-Purpose Data Lake
Product Traceability
Product Innovation
Risk Management
Pricing Optimization
Virtual Sandbox
Single View
Data as a Service
Operational Excellence
M & A
BI/Reporting
DATA
VIRTUALIZATION
Data Scientist
Call Center Analyst
Store Manager
Compliance Analyst
Shop Floor Supervisor
30
Multi-Purpose Data Lake With Data Virtualization
ENABLING TECHNOLOGIES
Virtualize Data, Don’t Migrate it
• Distributed heterogeneity is a challenge for the MDA
– Plague of data standards, models, quality metrics, interfaces…
• Consolidating diverse data is not a compelling solution
– Migration & consolidation alleviate complexity, but have other problems
– Time consuming, risky, disruptive, distracting
• DV is effective alternative to consolidation
– Fraction of the time, risk, cost and disruption of
migration and consolidation projects
– Software/hardware advances give DV the
speed/scale required of most SLAs & use cases
32
Big Data Queries Faster With Denodo Platform
Performance comparison of 5 different queries
1. Data virtualization delivers better performance without the need to replicate data into Hadoop.
2. Data virtualization leverages data source architectures for what they are good at.
Impala Hadoop-
only Runtime (s)
Denodo Runtime
(s)
Denodo Runtime
w/ Cache (s)
Data Volumes
Query 1 199 120 68 Queries 1,2,3,5
•Exadata Row
Count: ~5M
•Impala Row
Count: ~500k
Query 4
•Exadata Row
Count: ~5M
•Impala Row
Count: ~2M
Query 2 187 96 88
Query 3 120 212 115
Query 4 timeout 328 69
Query 5 46 91 56
w w w . a n a d a r k o . c o m A N A D A R K O P E T R O L E U M C O R P O R A T I O N 33
Anadarko employs
approximately 4,500 men and
women and invested about $4
billion in 2017 to find and
develop the oil and natural gas
resources that are essential to
modern life
COMPANY PROFILE
w w w . a n a d a r k o . c o m A N A D A R K O P E T R O L E U M C O R P O R A T I O N
Changing Commodity Cycle
34
better data
HONED FOCUS
faster data
ADJUSTED ORG
more data
ENHANCED TECH
w w w . a n a d a r k o . c o m A N A D A R K O P E T R O L E U M C O R P O R A T I O N
Self-Service Data Delivery Environment
Examples
• reduced ad valorem taxes for finance
• improved (production) completion design from multi-variate analysis using virtual views
• more (combined) access to vendor subscription data exploration for competitor
intelligence
C O R P O R A T I O N 35
To create and use data services for analytics, reports, and apps
Results (from 2017 roll-out/implementation)…
20corporate repositories; several non-corporate
200+ corporate views; 100+ user-defined views
30developers using/trained
150direct users; ∼700 indirect users
w w w . a n a d a r k o . c o m A N A D A R K O P E T R O L E U M C O R P O R A T I O N
Data Architecture at Anadarko
36
37
Why Multi-Purpose Data Lake?
• Surface all company data without the need to replicate
all data to the Hadoop lake
• Improve governance and metadata management to avoid
“data swamps”
• Allow for on-demand combination of real-time (from the
original sources) with historical data (in the cluster)
• Leverage the processing power of the existing data lake
clusters using Denodo’s optimizer
38
- Source: “Forrester Wave™: Big Data Fabric Q4 2016”
Denodo’s key strength is delivering a unified and centralized data
services fabric with security and real-time integration across
multiple traditional and big data sources, including Hadoop,
NoSQL, cloud, and software-as-a-service (SaaS).”
39
Gartner Gives DV Its Highest Maturity Rating
“Data
Virtualization can
be deployed with
low risk and
effort to achieve
maximum value.”
Q&A
Thank you!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

More Related Content

What's hot

SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
Denodo
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
Eric Javier Espino Man
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
Denodo
 
Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)
Denodo
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Denodo
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
Denodo
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Stephen Alex
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Data Con LA
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Denodo
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
MapR Technologies
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data Scenarios
Denodo
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Denodo
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
Denodo
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
Denodo
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
Osama Hussein
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
Zaloni
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data Virtualization
Denodo
 
Data Warehouse Logical Design Guide
Data Warehouse Logical Design GuideData Warehouse Logical Design Guide
Data Warehouse Logical Design Guide
Andy Yuan
 
Applying Big Data Superpowers to Healthcare
Applying Big Data Superpowers to HealthcareApplying Big Data Superpowers to Healthcare
Applying Big Data Superpowers to Healthcare
Paul Boal
 

What's hot (20)

SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 
Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data Scenarios
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data Virtualization
 
Data Warehouse Logical Design Guide
Data Warehouse Logical Design GuideData Warehouse Logical Design Guide
Data Warehouse Logical Design Guide
 
Applying Big Data Superpowers to Healthcare
Applying Big Data Superpowers to HealthcareApplying Big Data Superpowers to Healthcare
Applying Big Data Superpowers to Healthcare
 

Similar to Are You Killing the Benefits of Your Data Lake?

Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
Denodo
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
Inside Analysis
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Denodo
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
Inside Analysis
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
DataWorks Summit/Hadoop Summit
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
Die Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AIDie Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AI
Denodo
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
DataWorks Summit
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 

Similar to Are You Killing the Benefits of Your Data Lake? (20)

Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Die Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AIDie Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AI
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 

More from Denodo

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Denodo
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
Denodo
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
Denodo
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
Denodo
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Denodo
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
Denodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
Denodo
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
Denodo
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Denodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Denodo
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Denodo
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
Denodo
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Denodo
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
Denodo
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
Denodo
 

More from Denodo (20)

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
 

Recently uploaded

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 

Recently uploaded (20)

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 

Are You Killing the Benefits of Your Data Lake?

  • 1. Are You Killing the Benefits of Your Data Lake?
  • 2. Speakers Rick van der Lans Independent Business Intelligence Analyst R20 Consultancy Lakshmi Randall Director of Product Marketing Denodo @LakshmiLJ@rick_vanderlans
  • 3. Copyright © 2018 R20/Consultancy B.V., The Netherlands 3 Wikipedia: Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.
  • 4. Copyright © 2018 R20/Consultancy B.V., The Netherlands 4 Data Science Steps and Data Preparation Defining goals Data selection Data understanding Data enrichment Data cleansing Data coding Creating analytical model Analytics Understanding results
  • 5. Copyright © 2018 R20/Consultancy B.V., The Netherlands 5 Data Preparation is Time-Consuming Source: Gill Press, “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”, March 2016
  • 6. Copyright © 2018 R20/Consultancy B.V., The Netherlands 6 Common Definition of Data Lake James Serra: A “data lake” is a storage repository, usually in Hadoop, that holds a vast amount of raw data in its native format until it is needed. It’s a great place for investigating, exploring, experimenting, and refining data, in addition to archiving data. Source: http://www.jamesserra.com/archive/2015/04/what-is-a-data-lake/
  • 7. Copyright © 2018 R20/Consultancy B.V., The Netherlands 7 The Logical Data Lake All data sources Investigative analyticsData lake Data science
  • 8. Copyright © 2018 R20/Consultancy B.V., The Netherlands 8 Challenges of a Physical Data Lake Complex “T” moved to data usage Big data too big to move • Too slow to copy and bandwidth issues Uncooperative departments - company politics Restricting data privacy and protection regulations Data in data lake is stored outside original security realm Missing metadata to describe data Some sources are hard to copy • For example, mainframe data Refreshing of data lake Management of data lake required … Data lake
  • 9. Copyright © 2018 R20/Consultancy B.V., The Netherlands 9 The Logical (Virtual) Data Lake Data sources ETL ETL Cached Cached Logical Data Lake Data science and investigative users
  • 10. Copyright © 2018 R20/Consultancy B.V., The Netherlands 10 Data is too valuable an asset to be used for reporting only.
  • 11. Copyright © 2018 R20/Consultancy B.V., The Netherlands 11 A Multitude of Data Delivery Systems The classic data warehouse architecture The data lake The data marketplace Data services Managed file transfer Data streaming …
  • 12. Copyright © 2018 R20/Consultancy B.V., The Netherlands 12 Drawback: Replicated Specifications Data warehouse Data lake Data marketplace Data streaming Data file transfer Data services
  • 13. Copyright © 2018 R20/Consultancy B.V., The Netherlands 13 Drawback: Replicated Specifications Source System 1 Source System 2 Data warehouse Data lake Data services Analytics & reporting Data science App = =
  • 14. Copyright © 2018 R20/Consultancy B.V., The Netherlands 14 Siloed Data Delivery Systems
  • 15. Copyright © 2018 R20/Consultancy B.V., The Netherlands 15 Landing Zone Curated Zone Production Zone Data sources Business users A Physical Data Lake With Multiple Zones
  • 16. Copyright © 2018 R20/Consultancy B.V., The Netherlands 16 The Logical Data Warehouse Architecture Enterprise data layer Data consumption layer Data source layer DataViertualization
  • 17. Copyright © 2018 R20/Consultancy B.V., The Netherlands 17 DataVirtualizationServer Source systems Curated zone Production zone Landing zone Data Scientists and other Business users The Logical, Multi-Purpose Data Lake
  • 18. Copyright © 2018 R20/Consultancy B.V., The Netherlands 18 Key Features Missing in SQL-on-Hadoop Engines Allowing applications and users to access all the data through another interface than SQL Allowing all types of data sources to be accessed Detailed lineage and impact analysis capabilities A searchable data catalog Advanced query optimization techniques for federated queries Advanced query pushdown and parallel processing capabilities Centralized data security
  • 19. Copyright © 2018 R20/Consultancy B.V., The Netherlands 19 Single-Purpose versus Multi-Purpose Data Lake (1) The Single-Purpose Data Lake • Not always practical or feasible • The data in a data lake is potentially too valuable to be used by data scientists exclusively • Other user groups may be interested in the data lake • Siloed data delivery system operating independently of others • Multiple physical layers of lakes is complex
  • 20. Copyright © 2018 R20/Consultancy B.V., The Netherlands 20 Single-Purpose versus Multi-Purpose Data Lake (2) The Multi-Purpose Data Lake • Some data is physically stored centrally (through copying or caching), and some is accessed remotely • The data offered can be accessed by any type of business user • The data in the data sources can be transformed to any form that is required by other user groups • A logical, multi-purpose data lake can be the foundation for several data delivery systems • Working with logical layers is easy to manage and maintain
  • 21. Copyright © 2018 R20/Consultancy B.V., The Netherlands 21 Advantages Multi-Purpose Data Lakes Reduction of development costs • Metadata specifications are defined once and reused many times • Analytical solutions developed by one data scientist can easily be reused • Data-related solutions developed by non-data scientists can be reused Acceleration of development • Data scientists don’t need to spend time on data selection • Physically copying data is not mandatory, but optional • Business user don’t have to learn the technical languages and APIs of the original data sources Increase report and analytical consistency • Reusing analytical and data-related solutions improve the reporting and analytical consistency • Definitions, descriptions, tags, and categories can be centrally cataloged • Access to all the data can be centrally secured
  • 22. Copyright © 2018 R20/Consultancy B.V., The Netherlands 22 Time to Tear the Silos Down!
  • 24. 24 Shhh… the ugly little secret is that big data deployment is hard! Big Data Hadoop Deployments
  • 25. 25 Fifty Shades of Data Management Transactional Systems Data Warehouses SAS Applications Data Hubs OLAP Sources Data Catalogs Micro Services Data Marts Streams/ Queues/CDC Data Lakes Black Hole
  • 26. Copyright © Intelligent Solutions, Inc. 2018 All Rights Reserved ▪ For companies to benefit from their analytic efforts, data must be: ▪ Easily located (wherever it resides) ▪ Easily understood (with all its context in place) ▪ Easily accessed (query performance is critical) ▪ Easily audited (its lifecycle is clear to both IT and business users) ▪ Appropriately provisioned for analysis (its management is known) 26 The Analytics Environment Must Have a Brain…
  • 27. 27 A Few Simple Rules… 1. Build a business strategy rather than a big data strategy 2. Big data is really about small 3. Users come in all shapes and sizes • Who are they? What data do they need? What flexibility do they want? 4. Connect to all of the data (but start with the most important) • What data is needed by the users? Open access or pre- aggregated or pre-calculated? 5. Use the language that the business understands • Don’t force people to change terminology…support multiple models, e.g., to Finance it’s an ‘account’, to Customer Care it’s a ‘customer’. 27
  • 28. 28 Self-Service With Guardrails • Don’t build just for the ‘data cowboys’ • Create pre-integrated, pre-calculated data • Eliminating this burden from the users. • Ensures consistency of calculations, etc. • But allow the cowboys to ‘roam and wrangle’ • Even the cowboys can only access ‘approved’ data sources
  • 29. 29 A Single, Logical, Multi-Purpose Data Lake Product Traceability Product Innovation Risk Management Pricing Optimization Virtual Sandbox Single View Data as a Service Operational Excellence M & A BI/Reporting DATA VIRTUALIZATION Data Scientist Call Center Analyst Store Manager Compliance Analyst Shop Floor Supervisor
  • 30. 30 Multi-Purpose Data Lake With Data Virtualization
  • 31. ENABLING TECHNOLOGIES Virtualize Data, Don’t Migrate it • Distributed heterogeneity is a challenge for the MDA – Plague of data standards, models, quality metrics, interfaces… • Consolidating diverse data is not a compelling solution – Migration & consolidation alleviate complexity, but have other problems – Time consuming, risky, disruptive, distracting • DV is effective alternative to consolidation – Fraction of the time, risk, cost and disruption of migration and consolidation projects – Software/hardware advances give DV the speed/scale required of most SLAs & use cases
  • 32. 32 Big Data Queries Faster With Denodo Platform Performance comparison of 5 different queries 1. Data virtualization delivers better performance without the need to replicate data into Hadoop. 2. Data virtualization leverages data source architectures for what they are good at. Impala Hadoop- only Runtime (s) Denodo Runtime (s) Denodo Runtime w/ Cache (s) Data Volumes Query 1 199 120 68 Queries 1,2,3,5 •Exadata Row Count: ~5M •Impala Row Count: ~500k Query 4 •Exadata Row Count: ~5M •Impala Row Count: ~2M Query 2 187 96 88 Query 3 120 212 115 Query 4 timeout 328 69 Query 5 46 91 56
  • 33. w w w . a n a d a r k o . c o m A N A D A R K O P E T R O L E U M C O R P O R A T I O N 33 Anadarko employs approximately 4,500 men and women and invested about $4 billion in 2017 to find and develop the oil and natural gas resources that are essential to modern life COMPANY PROFILE
  • 34. w w w . a n a d a r k o . c o m A N A D A R K O P E T R O L E U M C O R P O R A T I O N Changing Commodity Cycle 34 better data HONED FOCUS faster data ADJUSTED ORG more data ENHANCED TECH
  • 35. w w w . a n a d a r k o . c o m A N A D A R K O P E T R O L E U M C O R P O R A T I O N Self-Service Data Delivery Environment Examples • reduced ad valorem taxes for finance • improved (production) completion design from multi-variate analysis using virtual views • more (combined) access to vendor subscription data exploration for competitor intelligence C O R P O R A T I O N 35 To create and use data services for analytics, reports, and apps Results (from 2017 roll-out/implementation)… 20corporate repositories; several non-corporate 200+ corporate views; 100+ user-defined views 30developers using/trained 150direct users; ∼700 indirect users
  • 36. w w w . a n a d a r k o . c o m A N A D A R K O P E T R O L E U M C O R P O R A T I O N Data Architecture at Anadarko 36
  • 37. 37 Why Multi-Purpose Data Lake? • Surface all company data without the need to replicate all data to the Hadoop lake • Improve governance and metadata management to avoid “data swamps” • Allow for on-demand combination of real-time (from the original sources) with historical data (in the cluster) • Leverage the processing power of the existing data lake clusters using Denodo’s optimizer
  • 38. 38 - Source: “Forrester Wave™: Big Data Fabric Q4 2016” Denodo’s key strength is delivering a unified and centralized data services fabric with security and real-time integration across multiple traditional and big data sources, including Hadoop, NoSQL, cloud, and software-as-a-service (SaaS).”
  • 39. 39 Gartner Gives DV Its Highest Maturity Rating “Data Virtualization can be deployed with low risk and effort to achieve maximum value.”
  • 40. Q&A
  • 41. Thank you! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.