SlideShare a Scribd company logo
1 of 4
Download to read offline
Copyright 2021 Hired Brains Research and Neil Raden
Data Lake/Lakehouse/Cloud Data
Warehouse: Which is Real?
By Neil Raden, Founder Hired Brains Research
Data Lake:
From Hadoop (and later, other cloud storage options), which was indifferent to the size
and type of files that could be processed, as opposed to the rigid and not nearly as
scalable nature of relational data warehouses, hatched the idea of the single place for
everything – the data lake. In truth, it was a concept hatched by the Hadoop
distributors to sell more licenses. Though it did simplify searching for and locating files,
it provided no analytical processing tools at all. The logic of moving a JSON file from
Paris, France, to a Paris, Texas cloud location adds no value except for some economies
of scale in storage and processing.
The data lake collects raw data, thousands, perhaps millions of files. This is
posited as a benefit. But is it really? At a certain level, raw data is an
oxymoron. We can't triangulate data to see if it's consistent with other
instances of the same phenomenon or event. "Raw data" typically implies it is
to be used for a particular purpose, and it is the beginning point for drawing
inferences and drawing conclusions. The context of data — why, how, and
when it was recorded, and what method it was collected and then transformed
is essential. Context-free data simply does not exist. The perfect objectivity we
assign to "raw data" is a myth. That's why in data warehousing, we attempted
to integrate and rationalize things.
Industry analyst Andrew Brust, in "Big on Data," quotes George Fraser, CEO of
Fivetran:
"I think 2021 will reveal the need for data lakes in the modern data stack is
shrinking...there are no longer new technical reasons for adopting data lakes because
data warehouses that separate compute from storage have emerged." If that's not
categorical enough for you, Fraser sums things up thus: "In the world of the modern
data stack, data lakes are not the optimal solution. They are becoming legacy
technology."
For organizations that lack cloud-native data warehouses that separate compute from
storage or even lack a cloud strategy, that is something of an oversimplification. The
calculation of costs of hybrid-cloud, multi-cloud, separation of storage from
compute...border on alchemy. And even a good approximation is only as good as when
Copyright 2021 Hired Brains Research and Neil Raden
you make it because things change so quickly. There is one secret, though, that you
will do worse without a model no matter what approach you take.
Another thing to consider is that "organization" is often an oxymoron. While there may
be a single "strategy" for data architecture in most organizations, the result of
acquisitions, legacies, geography, and just the usual punctuated progress, there may be
a collection of them, distributed physically and architecturally. The best advice is:
Pay more attention to what your data means than where you put it.
To patch some of the data lake idea's manifest deficiencies, cloud providers have
regularly added processing capabilities that mimic early data warehousing features –
comically calling it the "Data Lakehouse" (or the Databricks variant, the Delta Lake)
Data Lakehouse:
According to Databricks, "A data lakehouse is a new, open data management paradigm
that combines the capabilities of data lakes and data warehouses, enabling BI and ML
on all data. ... Merging them into a single system means that data teams can move
faster as they can use data without accessing multiple systems." This statement is more
aspirational than fact. Data warehouses represent forty years of continuous (though not
always smooth) progress and provide all of the services that are needed, such as:
• AI-driven query optimizer
• Complex query formation
• Massively parallel operation based on the model, not just sharding
• Workload Management
• Load Balancing
• Scaling to thousands of simultaneous queries
• Full ANSI SQL and beyond
• In-database Advanced Analytics and support for ML
• Ability to handle native data types such as spatial and time-series
The fact is that some data warehouse platforms do perform all of these functions and
more and are very central to the operations of businesses.
In the early seventies, the world was beset with an energy crisis. Some executives in
Detroit decided that the US needed small cars, with which they had little experience,
but they came up with a platform anyway. But Americans loved their pickup trucks,
which accounted for a substantial share of the automaker's revenue, Ford and Chevy
especially. When you have a terrible solution, the worst thing you can do is pile on
more terrible decisions - the 1973 Ford Courier mini pickup truck, one of the worst,
poorly designed, ill-conceived vehicles in history.
Copyright 2021 Hired Brains Research and Neil Raden
If you can query a JSON file in the Data Lakehouse with SQL transparently, you have
accomplished something. But not enough. What troubles me the most is that the data
lakehouse's excuse is that it's a data lake with some analytical capabilities. What I
haven't heard are understandability and usability. Those capabilities are mostly
inherited from the expanding capabilities of cloud services themselves.
Cloud Data Warehouse:
Cloud data warehouses and there are principally three: AWS Redshift, Snowflake, and
Google BigQuery. Many other relational data warehouse technologies have acceptable
cloud versions, but the cloud-natives claim the high ground for now. At a certain
maturity, they provide all of the functions listed above, rather than being bolt-on
capabilities to generic cloud features. However, it does get a little blurry because the
CDW's provide more than a traditional data warehouse. One, for example, proves a
public data exchange market. I've noticed the word "warehouse" starting to disappear
from their content.
Would you rather have a cloud-native data warehouse that can handle the most
challenging data warehouse tasks but can also provide most of the functionality of a
data lake (or, to put it another way, to eliminate the need for a data lake), or would
you prefer a data lake with partial data warehouse capabilities slapped on?
To sum up:
1. The concept of a data lake is flawed. In an age of multi-cloud and hybrid- cloud
distributed data, not to mention sprawling sensor farms of IoT, there is no
advantage to pulling it all together. AI-driven knowledge graphs are a far better
alternative to locating and tagging data where it is.
2. If you dismiss the data lake, you must of necessity dismiss the lake house
3. Pay more attention to what your data means than where you put it
A data lake looks to me to be static "dumb" data neatly arranged. A data lakehouse, if
you must use that term, is fundamentally different from a data warehouse. It is a
comprehensive set of capabilities that provides a graph-based linked and
contextualized information fabric (semantic metadata and linked datasets) where NLP
(Natural Language Processing), Sentiment Analysis, Rules Engines, Connectors,
Copyright 2021 Hired Brains Research and Neil Raden
Canonical Models for common domains. Add to that cognitive tools that can be
plugged in to turn "dumb" data into information assets with speed, agility, reuse, and
value. I haven't seen one yet.
Neil Raden founded Hired Brains Research in 1985 to provide thought leadership, context, and advisory
consulting and implementation services in Data Architecture, AI, Analytics/Data Science, and organizational
change for analytics for clients worldwide across many industries. Neil is a recognized authority on AI Ethics,
the co-author of the first book on Decision Man agent, "Smart (Enough) Systems," and the foundational
report for the Society of Actuaries, "Ethical Use of Artificial Intelligence for Actuaries." He welcomes your
comments at nraden@hiredbrains.com

More Related Content

What's hot

How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
The Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInOSCON Byrum
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukErwin de Kreuk
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiAccelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiDatabricks
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Cloudera, Inc.
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Cloudera, Inc.
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeSergio Zenatti Filho
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopCloudera, Inc.
 
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierWebinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierDataStax
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 

What's hot (20)

How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
The Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedIn
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiAccelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali Ghodsi
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data Lake
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
 
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierWebinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 

Similar to Data lakehouse fallacies

Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...Neil Raden
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014Kenneth Igiri
 
Big data management
Big data managementBig data management
Big data managementzeba khanam
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureDenodo
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0Jane Roberts
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013nkabra
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lakesambiswal
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptxRRamyaDevi
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.Artan Ajredini
 

Similar to Data lakehouse fallacies (20)

Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
 
On nosql
On nosqlOn nosql
On nosql
 
Data lakes
Data lakesData lakes
Data lakes
 
Big data management
Big data managementBig data management
Big data management
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lake
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptx
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
NOSQL
NOSQLNOSQL
NOSQL
 
Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.
 

More from Neil Raden

Kagan our constitutional crisis is already here
Kagan our constitutional crisis is already here Kagan our constitutional crisis is already here
Kagan our constitutional crisis is already here Neil Raden
 
Evaluating the opportunity for embedded ai in data productivity tools
Evaluating the opportunity for embedded ai in data productivity toolsEvaluating the opportunity for embedded ai in data productivity tools
Evaluating the opportunity for embedded ai in data productivity toolsNeil Raden
 
Diginomica 2019 2020 not ai neil raden article links and captions
Diginomica 2019 2020 not ai  neil raden article links and captionsDiginomica 2019 2020 not ai  neil raden article links and captions
Diginomica 2019 2020 not ai neil raden article links and captionsNeil Raden
 
Diginomica 2019 2020 ai ai ethics neil raden articles links and captions
Diginomica 2019 2020 ai ai ethics neil raden articles links and captionsDiginomica 2019 2020 ai ai ethics neil raden articles links and captions
Diginomica 2019 2020 ai ai ethics neil raden articles links and captionsNeil Raden
 
Ethical use of ai for actuaries
Ethical use of ai for actuariesEthical use of ai for actuaries
Ethical use of ai for actuariesNeil Raden
 
Strategy Report for NextGen BI
Strategy Report for NextGen BIStrategy Report for NextGen BI
Strategy Report for NextGen BINeil Raden
 
Precision medicine and AI: problems ahead
Precision medicine and AI: problems aheadPrecision medicine and AI: problems ahead
Precision medicine and AI: problems aheadNeil Raden
 
Global Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldGlobal Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldNeil Raden
 
Persistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerPersistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerNeil Raden
 
Understanding the effects of steroid hormone exposure on direct gene regulati...
Understanding	the effects of steroid hormone exposure on direct gene regulati...Understanding	the effects of steroid hormone exposure on direct gene regulati...
Understanding the effects of steroid hormone exposure on direct gene regulati...Neil Raden
 
Storytelling Drives Usefulness in Business Intelligence
Storytelling Drives Usefulness in Business IntelligenceStorytelling Drives Usefulness in Business Intelligence
Storytelling Drives Usefulness in Business IntelligenceNeil Raden
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business ModelingNeil Raden
 

More from Neil Raden (13)

Kagan our constitutional crisis is already here
Kagan our constitutional crisis is already here Kagan our constitutional crisis is already here
Kagan our constitutional crisis is already here
 
Keynote Dubai
Keynote DubaiKeynote Dubai
Keynote Dubai
 
Evaluating the opportunity for embedded ai in data productivity tools
Evaluating the opportunity for embedded ai in data productivity toolsEvaluating the opportunity for embedded ai in data productivity tools
Evaluating the opportunity for embedded ai in data productivity tools
 
Diginomica 2019 2020 not ai neil raden article links and captions
Diginomica 2019 2020 not ai  neil raden article links and captionsDiginomica 2019 2020 not ai  neil raden article links and captions
Diginomica 2019 2020 not ai neil raden article links and captions
 
Diginomica 2019 2020 ai ai ethics neil raden articles links and captions
Diginomica 2019 2020 ai ai ethics neil raden articles links and captionsDiginomica 2019 2020 ai ai ethics neil raden articles links and captions
Diginomica 2019 2020 ai ai ethics neil raden articles links and captions
 
Ethical use of ai for actuaries
Ethical use of ai for actuariesEthical use of ai for actuaries
Ethical use of ai for actuaries
 
Strategy Report for NextGen BI
Strategy Report for NextGen BIStrategy Report for NextGen BI
Strategy Report for NextGen BI
 
Precision medicine and AI: problems ahead
Precision medicine and AI: problems aheadPrecision medicine and AI: problems ahead
Precision medicine and AI: problems ahead
 
Global Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldGlobal Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid World
 
Persistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerPersistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the Answer
 
Understanding the effects of steroid hormone exposure on direct gene regulati...
Understanding	the effects of steroid hormone exposure on direct gene regulati...Understanding	the effects of steroid hormone exposure on direct gene regulati...
Understanding the effects of steroid hormone exposure on direct gene regulati...
 
Storytelling Drives Usefulness in Business Intelligence
Storytelling Drives Usefulness in Business IntelligenceStorytelling Drives Usefulness in Business Intelligence
Storytelling Drives Usefulness in Business Intelligence
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Data lakehouse fallacies

  • 1. Copyright 2021 Hired Brains Research and Neil Raden Data Lake/Lakehouse/Cloud Data Warehouse: Which is Real? By Neil Raden, Founder Hired Brains Research Data Lake: From Hadoop (and later, other cloud storage options), which was indifferent to the size and type of files that could be processed, as opposed to the rigid and not nearly as scalable nature of relational data warehouses, hatched the idea of the single place for everything – the data lake. In truth, it was a concept hatched by the Hadoop distributors to sell more licenses. Though it did simplify searching for and locating files, it provided no analytical processing tools at all. The logic of moving a JSON file from Paris, France, to a Paris, Texas cloud location adds no value except for some economies of scale in storage and processing. The data lake collects raw data, thousands, perhaps millions of files. This is posited as a benefit. But is it really? At a certain level, raw data is an oxymoron. We can't triangulate data to see if it's consistent with other instances of the same phenomenon or event. "Raw data" typically implies it is to be used for a particular purpose, and it is the beginning point for drawing inferences and drawing conclusions. The context of data — why, how, and when it was recorded, and what method it was collected and then transformed is essential. Context-free data simply does not exist. The perfect objectivity we assign to "raw data" is a myth. That's why in data warehousing, we attempted to integrate and rationalize things. Industry analyst Andrew Brust, in "Big on Data," quotes George Fraser, CEO of Fivetran: "I think 2021 will reveal the need for data lakes in the modern data stack is shrinking...there are no longer new technical reasons for adopting data lakes because data warehouses that separate compute from storage have emerged." If that's not categorical enough for you, Fraser sums things up thus: "In the world of the modern data stack, data lakes are not the optimal solution. They are becoming legacy technology." For organizations that lack cloud-native data warehouses that separate compute from storage or even lack a cloud strategy, that is something of an oversimplification. The calculation of costs of hybrid-cloud, multi-cloud, separation of storage from compute...border on alchemy. And even a good approximation is only as good as when
  • 2. Copyright 2021 Hired Brains Research and Neil Raden you make it because things change so quickly. There is one secret, though, that you will do worse without a model no matter what approach you take. Another thing to consider is that "organization" is often an oxymoron. While there may be a single "strategy" for data architecture in most organizations, the result of acquisitions, legacies, geography, and just the usual punctuated progress, there may be a collection of them, distributed physically and architecturally. The best advice is: Pay more attention to what your data means than where you put it. To patch some of the data lake idea's manifest deficiencies, cloud providers have regularly added processing capabilities that mimic early data warehousing features – comically calling it the "Data Lakehouse" (or the Databricks variant, the Delta Lake) Data Lakehouse: According to Databricks, "A data lakehouse is a new, open data management paradigm that combines the capabilities of data lakes and data warehouses, enabling BI and ML on all data. ... Merging them into a single system means that data teams can move faster as they can use data without accessing multiple systems." This statement is more aspirational than fact. Data warehouses represent forty years of continuous (though not always smooth) progress and provide all of the services that are needed, such as: • AI-driven query optimizer • Complex query formation • Massively parallel operation based on the model, not just sharding • Workload Management • Load Balancing • Scaling to thousands of simultaneous queries • Full ANSI SQL and beyond • In-database Advanced Analytics and support for ML • Ability to handle native data types such as spatial and time-series The fact is that some data warehouse platforms do perform all of these functions and more and are very central to the operations of businesses. In the early seventies, the world was beset with an energy crisis. Some executives in Detroit decided that the US needed small cars, with which they had little experience, but they came up with a platform anyway. But Americans loved their pickup trucks, which accounted for a substantial share of the automaker's revenue, Ford and Chevy especially. When you have a terrible solution, the worst thing you can do is pile on more terrible decisions - the 1973 Ford Courier mini pickup truck, one of the worst, poorly designed, ill-conceived vehicles in history.
  • 3. Copyright 2021 Hired Brains Research and Neil Raden If you can query a JSON file in the Data Lakehouse with SQL transparently, you have accomplished something. But not enough. What troubles me the most is that the data lakehouse's excuse is that it's a data lake with some analytical capabilities. What I haven't heard are understandability and usability. Those capabilities are mostly inherited from the expanding capabilities of cloud services themselves. Cloud Data Warehouse: Cloud data warehouses and there are principally three: AWS Redshift, Snowflake, and Google BigQuery. Many other relational data warehouse technologies have acceptable cloud versions, but the cloud-natives claim the high ground for now. At a certain maturity, they provide all of the functions listed above, rather than being bolt-on capabilities to generic cloud features. However, it does get a little blurry because the CDW's provide more than a traditional data warehouse. One, for example, proves a public data exchange market. I've noticed the word "warehouse" starting to disappear from their content. Would you rather have a cloud-native data warehouse that can handle the most challenging data warehouse tasks but can also provide most of the functionality of a data lake (or, to put it another way, to eliminate the need for a data lake), or would you prefer a data lake with partial data warehouse capabilities slapped on? To sum up: 1. The concept of a data lake is flawed. In an age of multi-cloud and hybrid- cloud distributed data, not to mention sprawling sensor farms of IoT, there is no advantage to pulling it all together. AI-driven knowledge graphs are a far better alternative to locating and tagging data where it is. 2. If you dismiss the data lake, you must of necessity dismiss the lake house 3. Pay more attention to what your data means than where you put it A data lake looks to me to be static "dumb" data neatly arranged. A data lakehouse, if you must use that term, is fundamentally different from a data warehouse. It is a comprehensive set of capabilities that provides a graph-based linked and contextualized information fabric (semantic metadata and linked datasets) where NLP (Natural Language Processing), Sentiment Analysis, Rules Engines, Connectors,
  • 4. Copyright 2021 Hired Brains Research and Neil Raden Canonical Models for common domains. Add to that cognitive tools that can be plugged in to turn "dumb" data into information assets with speed, agility, reuse, and value. I haven't seen one yet. Neil Raden founded Hired Brains Research in 1985 to provide thought leadership, context, and advisory consulting and implementation services in Data Architecture, AI, Analytics/Data Science, and organizational change for analytics for clients worldwide across many industries. Neil is a recognized authority on AI Ethics, the co-author of the first book on Decision Man agent, "Smart (Enough) Systems," and the foundational report for the Society of Actuaries, "Ethical Use of Artificial Intelligence for Actuaries." He welcomes your comments at nraden@hiredbrains.com