SlideShare a Scribd company logo
☺
ANALYSIS
&
VISUALISATION
DIGITAL
COMPANY
DATA
PROCESSING
&
STORAGE
DATA
INTEGRATION
DATA
EXPLOITATION
ENTERPRISE
INFORMATION
MANAGEMENT
DATA
SOURCES
DATA
WAREHOUSE
BUSINESS
INTELLIGENCE
APPLICATION
DEVELOPMENT
LEGACY
ERP, CRM…
IoTBIG DATA
HADOOP
ADVANCED
ANALYTICS
MOBILE APPS
ARTIFICIAL
INTELLIGENCE
APPLICATION
PERFORMANCE
MANAGEMENT
CLOUD
BLOCKCHAIN
Canada
Czechia
Slovakia
Germany
Bulgaria
Russia
USA
Thailand
Adastra
Adastra Business Consulting
Adastra One
Acamar
Ataccama
Proboston Creative
Blindspot AI
Instarea
2000+ Employees
Source: CzechInvest, April 2019
Zdroje: Domo, erwin
Sources: Wikipedia, IBM
1985 - 1995 Prehistory
•Controlled chaos
•Best practice awaking
•Manual scripting
•Basic Relational Analytics
1995 – 2005 Antiquity
•Clash of Titans: Kimball vs.
Inmon
•Mature Best practices
•Enterprise Data Warehouse
•ETL
•OLAP
•Complex Relational Analytics
2005-2015 Middle Ages
•High-end traditional Data Warehousing
•Hub-and-spoke architecture
•Data Governance
•MDM (mechanically divided meat)
•MDD
•ELT
•Data Vault
•Data Mining
•DW appliance
•Columnar DB
•In-memory DB
•Hadoop Stack Dawn
•Unstructured data analytics
2015 – Modern Age
•Logical Data Warehouse
•Extended Data Warehouse
•Data Lake
•Polyglot Architecture
•Kappa / Lambda
•Databus
•Data Pipeline
•Real-time
•Big Data ETL
•Open Source
•Big Data Analytics
•Self-service
•Data Science
•Machine Learning & AI
•Hadoop without Hadoop
•Stream Analytics
•All data analytics
•Data Management Platform
•Cloud
•Automation
•Autonomous Technologies
•New Elasticities of Compute & Storage
•Serverless (incl. Serveless DW)
Small & Midsize DW
Midsize & Large DW
Midsize & Large DW
Vision Reality
Bigger Data
Volumes
Semi-structured
& Unstructured
Data
Solution
Complexity
Fast Data
Legacy Data
Warehouses
Slow Cloud
Adoption (incl.
DWaaS)
Tighter SLA &
Low Performance
Better Service
Quality
Regulatory
Requests
Increasing TCO
Bad Time 2
Market
Weird
Technology
Stacks
Weak Interation
with Enterprise
Architecture
Sterile
Multitenant
Solutions
Technology First
Data Lineage
Obssesion
Lack of DW
experts
Accumulate
Technological
Debts
Non-actionable
Analytics
Too many Data
Warehouses
Divided Data
Platforms (Silos)
Low Added Value
for Business
No Raw Data
Relaxed or
Ovefitted Data
Quality
Missing Self
Service
Insufficient
History
No Automation
Hard Coding No Agile
Data Lake as DW
Replacement
Slow Provisioning Limited Elasticity
Old Templates for
Current Problems
Bad Data
Granularity
One Processing
Frequency
Data Hoarding
Megalomaniac
Scope
No Parallel
Processing
Limited Data
History
Missing
Metadata
Missing
Documentation
Missing Data
Architecture
Missing Design
Standards
Ineffective ETL
Development
Infinite cycle of
insourcing &
outsourcing
Multivendor
Competitions
without Strategy
Missing Data
Strategy
Siloed MDMs
Data Stream
Isolation
Not Real
Reference Data
or only for DW
No Business
Analytics Above
Unmanaged Data
Variety and
Variance
No Real Data
Governance
Evil Database
Dwarves
https://learn.panoply.io/hubfs/Downloadable%20Content/Data_Warehouse_Trends_2019.pdf
Opportunities Abound (DWs are rare in SME segment)
Redshift is losing ground
Complexity remains a significant ‘sore spot’ for data warehouse users
Performance issues are also persistent
PROPER TOOLS SIMPLIFY DATA WAREHOUSING (AND ALSO NASA’S EXPERIMENTS)
16
2012
2018
HIGH PERFORMANCE IS MANDATORY (AND COOL)19
DATA INTEGRATION ARCHITECTURE
ETL vs. ELT Big Data ETL: ETL on Hadoop vs. ELT in Hadoop
Lambda = Kappa + Batch Layer
Kappa = Lambda – Batch Layer
Polyglot / Big SQL
Sources: Ericsson, Oracle, Software Advice
Traditional Data Warehouse (DW) Data Lake (DL) Extended Data Warehouse (XDW)
Data Structured Structured & Semi-Structured & Unstructured Structured & Semi-Structured & Unstructured
Data Processing Processed Raw Processed & Raw
Data Schema Schema-on-write Schema-on-read Schema-on-write & Schema-on-read
Data Model Relational Object-based Relational & Object-based
Data History Hierarchically archived No hierarchy Hierarchically archived & No hierarchy
Agility Fixed configuration Reconfigured anytime as needed Fixed configuration
Reconfigured anytime as needed
Security Mature Maturing Mature
Primary Users Data analysists &
Business professionals
Data Scientists Data analysists & Business professionals &
Data scientists
Technology RDBMS NoSQL DBMS
Hadoop
Other distributed storages
RDBMS
NoSQL DBMS
Hadoop
Other distributed storages
Agility Low High Medium
Added Value Medium Medium High
Cost High Low Medium
DATA WAREHOUSE VS. DATA LAKE: DIFFERENT TECHNOLOGIES BUT SAME RESULTS
Application ServerWeb Server
Pentaho Data
Integration
(Web Console)
Adastra
Workflow
GUI
Adastra
Ref Books
GUI
Adastra
Worflow
Middleware
Adastra
Ref Books
Middleware
Pentaho Data
Integration
(Carte)
Pentaho Data
Integration
(Repository)
Adastra
Worfklow
for RDBMS
Database
Scheduler
Adastra
Ref Books
Store
Adastra
ELT
SAP
PowerDesginer
Adastra
ELT & Workflow
Code Generator
External
Worfklow
Adastra
Data Model
Design Time
Run Time
RDBMS
Data Source
Data
Warehouse
Stream Processor
Stage Database
ETL/ELT
Custom Development
Big Data ETLCluster Filesytem
Data Services
Data Extractor File System
Messaging
Change Data Capture
Clustered Stage DB
NoSQL
Many Ways from Data Sources to Data Warehouse (Real Example)
On-Premises
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
IaaS
(Infrastructure as a
Service)
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
PaaS
(Platform as a Service)
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
SaaS
(Software as a Service)
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
ELASTICITY LEADS THE WAY (AND THIS IS NOT 737 MAX ON THE PICTURE)
26
AUTOMATION & AUTONOMOUS TECHNOLIGIES ARE NEW BLACK
27
DATA GOVERNANCE MUST BE COMPLEX
Concepts
Vision & Mission
Guiding Principles
Organization & Roles
Business Rules
Activities
Scope
Benefits & Goals
Components
Data Architecture
Data Quality
Data Integration
Operations
Security
RDM & MDM
Metadata
Data Platform & BI
Tools
CASE
Enteprise Metadata Repository
Data Quality Tools
QA Framework
Workflow & Orchestration
IDE
Audit Log
Resource Management
RDBMS
NoSQL
Hadoop
Integration tools
Monitoring
Source Code Repository
Testing Tools
Others
Why What How
Business Drivers
• Digitalization
• Smart Data incl. Single Customer
View
• Data Literacy
• External Analytics Innovation
• Quick Time-to-Market
• Business Process Autonomous
Optimization
• Individual Personal Offers at
Massive Scale
Analytics
• Single View of Facts
• Augmented Analytics
• Advanced Data Visualization
• Predictive & Prescriptive Analytics
• Collaborative Business Intelligence
• Natural Language Processing
• Self-Service BI & Data Preparation
• Data Discovery
• Data Science
• Effective Advanced Analytics incl.
Real-time
• External Analytics Innovation
• Embedded Analytics
• Stream Analytics
• Geospatial Analytics
• Data Blending
Governance & Architecture
• Data Quality Management &
Master Data Management
• Holistic Data Governance incl. Big
Data
• Embracing Data Catalogs
• Real Data Science Governance
• Full Data Lifecycle Management
• Metadata Integration incl. Real-
time
• Advanced Security & Audit
• Data Warehouse Modernization
(XDW)
• Data Warehouse Automation
(DWA)
• Doom of Classical Data
Warehousing (Hub & Spoke)
• Lambda & Kappa Architecture
• Data Lake 2.0
• Analytic Data Store 2.0
Technology
• Serverless
• Compute & Storage Divided
Elasticities
• Extreme Performance &
Appliances
• Polymorphic Data Models
• Artificial Intelligence
• Cloud Continuum
• Edge Computing
• Complex Metadata Driven
Development
• Data Management Platforms
• Data & Processing Offloading
• Data Virtualization & Data
Federation (Polyglot Architecture)
• Autonomous Technologies
• Next Generation Sandboxing
• In-memory
• DWaaS
• Digital Twins
Bigger Data
Volumes
Semi-structured
& Unstructured
Data
Solution
Complexity
Fast Data
Legacy Data
Warehouses
Slow Cloud
Adoption (incl.
DWaaS)
Tighter SLA &
Low Performance
Better Service
Quality
Regulatory
Requests
Increasing TCO
Bad Time 2
Market
Weird
Technology
Stacks
Weak Interation
with Enterprise
Architecture
Sterile
Multitenant
Solutions
Technology First
Data Lineage
Obssesion
Lack of DW
experts
Accumulate
Technological
Debts
Non-actionable
Analytics
Too many Data
Warehouses
Divided Data
Platforms (Silos)
Low Added Value
for Business
No Raw Data
Relaxed or
Ovefitted Data
Quality
Missing Self
Service
Insufficient
History
No Automation
Hard Coding No Agile
Data Lake as DW
Replacement
Slow Provisioning Limited Elasticity
Old Templates for
Current Problems
Bad Data
Granularity
One Processing
Frequency
Data Hoarding
Megalomaniac
Scope
No Parallel
Processing
Limited Data
History
Missing
Metadata
Missing
Documentation
Missing Data
Architecture
Missing Design
Standards
Ineffective ETL
Development
Infinite cycle of
insourcing &
outsourcing
Multivendor
Competitions
without Strategy
Missing Data
Strategy
Siloed MDMs
Data Stream
Isolation
Not Real
Reference Data
or only for DW
No Business
Analytics Above
Unmanaged Data
Variety and
Variance
No Real Data
Governance
Evil Database
Dwarves
☺
TORONTO
LONDON
FRANKFURT
PRAGUE
BRATISLAVA
SOFIA
MOSCOW
THANK YOU !
CONTACT ADASTRA GROUP
CZECH REPUBLIC
KAROLINSKÁ 654/2
186 00 PRAHA 8
+420 271 733 303
INFOCZ@ADASTRAGRP.COM
WWW.ADASTRA.CZ
https://www.linkedin.com/in/martin-b%C3%A9m-7a92089/

More Related Content

What's hot

Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
Robert Chong
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Data Con LA
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
sambiswal
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
Martin Bém
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
CCG
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
Calum Miller
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI Standards
Arcadia Data
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
John Yeung
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
Caserta
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
Bui Ha
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
Aaron (Ari) Bornstein
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
CCG
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
Eduardo Castro
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
James Serra
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 

What's hot (20)

Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI Standards
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 

Similar to Pitfalls of Data Warehousing_2019-04-24

Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
Microsoft
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
Alok Mohapatra
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
elephantscale
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoop
Anu Ravindranath
 

Similar to Pitfalls of Data Warehousing_2019-04-24 (20)

Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoop
 

More from Martin Bém

Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04
Martin Bém
 
Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27
Martin Bém
 
Meetup 2018-10-23
Meetup 2018-10-23Meetup 2018-10-23
Meetup 2018-10-23
Martin Bém
 
Prague data management meetup 2018-04-17
Prague data management meetup 2018-04-17Prague data management meetup 2018-04-17
Prague data management meetup 2018-04-17
Martin Bém
 
Prague data management meetup 2018-05-22
Prague data management meetup 2018-05-22Prague data management meetup 2018-05-22
Prague data management meetup 2018-05-22
Martin Bém
 
Prague data management meetup 2018-02-27
Prague data management meetup 2018-02-27Prague data management meetup 2018-02-27
Prague data management meetup 2018-02-27
Martin Bém
 
Prague data management meetup 2018-01-30
Prague data management meetup 2018-01-30Prague data management meetup 2018-01-30
Prague data management meetup 2018-01-30
Martin Bém
 
Prague data management meetup 2017-11-21
Prague data management meetup 2017-11-21Prague data management meetup 2017-11-21
Prague data management meetup 2017-11-21
Martin Bém
 
Prague data management meetup 2017-10-24
Prague data management meetup 2017-10-24Prague data management meetup 2017-10-24
Prague data management meetup 2017-10-24
Martin Bém
 
Prague data management meetup 2017-09-26
Prague data management meetup 2017-09-26Prague data management meetup 2017-09-26
Prague data management meetup 2017-09-26
Martin Bém
 
Prague data management meetup 2017-05-16
Prague data management meetup 2017-05-16Prague data management meetup 2017-05-16
Prague data management meetup 2017-05-16
Martin Bém
 
Prague data management meetup 2017-03-28
Prague data management meetup 2017-03-28Prague data management meetup 2017-03-28
Prague data management meetup 2017-03-28
Martin Bém
 
Prague data management meetup 2017-04-25
Prague data management meetup 2017-04-25Prague data management meetup 2017-04-25
Prague data management meetup 2017-04-25
Martin Bém
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28
Martin Bém
 
Prague data management meetup 2016-11-22
Prague data management meetup 2016-11-22Prague data management meetup 2016-11-22
Prague data management meetup 2016-11-22
Martin Bém
 
Prague data management meetup 2016-10-17
Prague data management meetup 2016-10-17Prague data management meetup 2016-10-17
Prague data management meetup 2016-10-17
Martin Bém
 
Prague data management meetup 2016-09-22
Prague data management meetup 2016-09-22Prague data management meetup 2016-09-22
Prague data management meetup 2016-09-22
Martin Bém
 
Prague data management meetup 2016-03-07
Prague data management meetup 2016-03-07Prague data management meetup 2016-03-07
Prague data management meetup 2016-03-07
Martin Bém
 
Prague data management meetup 2016-01-12 pub
Prague data management meetup 2016-01-12 pubPrague data management meetup 2016-01-12 pub
Prague data management meetup 2016-01-12 pub
Martin Bém
 
Prague data management meetup 2015 11-23
Prague data management meetup 2015 11-23Prague data management meetup 2015 11-23
Prague data management meetup 2015 11-23
Martin Bém
 

More from Martin Bém (20)

Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04
 
Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27
 
Meetup 2018-10-23
Meetup 2018-10-23Meetup 2018-10-23
Meetup 2018-10-23
 
Prague data management meetup 2018-04-17
Prague data management meetup 2018-04-17Prague data management meetup 2018-04-17
Prague data management meetup 2018-04-17
 
Prague data management meetup 2018-05-22
Prague data management meetup 2018-05-22Prague data management meetup 2018-05-22
Prague data management meetup 2018-05-22
 
Prague data management meetup 2018-02-27
Prague data management meetup 2018-02-27Prague data management meetup 2018-02-27
Prague data management meetup 2018-02-27
 
Prague data management meetup 2018-01-30
Prague data management meetup 2018-01-30Prague data management meetup 2018-01-30
Prague data management meetup 2018-01-30
 
Prague data management meetup 2017-11-21
Prague data management meetup 2017-11-21Prague data management meetup 2017-11-21
Prague data management meetup 2017-11-21
 
Prague data management meetup 2017-10-24
Prague data management meetup 2017-10-24Prague data management meetup 2017-10-24
Prague data management meetup 2017-10-24
 
Prague data management meetup 2017-09-26
Prague data management meetup 2017-09-26Prague data management meetup 2017-09-26
Prague data management meetup 2017-09-26
 
Prague data management meetup 2017-05-16
Prague data management meetup 2017-05-16Prague data management meetup 2017-05-16
Prague data management meetup 2017-05-16
 
Prague data management meetup 2017-03-28
Prague data management meetup 2017-03-28Prague data management meetup 2017-03-28
Prague data management meetup 2017-03-28
 
Prague data management meetup 2017-04-25
Prague data management meetup 2017-04-25Prague data management meetup 2017-04-25
Prague data management meetup 2017-04-25
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28
 
Prague data management meetup 2016-11-22
Prague data management meetup 2016-11-22Prague data management meetup 2016-11-22
Prague data management meetup 2016-11-22
 
Prague data management meetup 2016-10-17
Prague data management meetup 2016-10-17Prague data management meetup 2016-10-17
Prague data management meetup 2016-10-17
 
Prague data management meetup 2016-09-22
Prague data management meetup 2016-09-22Prague data management meetup 2016-09-22
Prague data management meetup 2016-09-22
 
Prague data management meetup 2016-03-07
Prague data management meetup 2016-03-07Prague data management meetup 2016-03-07
Prague data management meetup 2016-03-07
 
Prague data management meetup 2016-01-12 pub
Prague data management meetup 2016-01-12 pubPrague data management meetup 2016-01-12 pub
Prague data management meetup 2016-01-12 pub
 
Prague data management meetup 2015 11-23
Prague data management meetup 2015 11-23Prague data management meetup 2015 11-23
Prague data management meetup 2015 11-23
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 

Pitfalls of Data Warehousing_2019-04-24

  • 1.
  • 2.
  • 3. ANALYSIS & VISUALISATION DIGITAL COMPANY DATA PROCESSING & STORAGE DATA INTEGRATION DATA EXPLOITATION ENTERPRISE INFORMATION MANAGEMENT DATA SOURCES DATA WAREHOUSE BUSINESS INTELLIGENCE APPLICATION DEVELOPMENT LEGACY ERP, CRM… IoTBIG DATA HADOOP ADVANCED ANALYTICS MOBILE APPS ARTIFICIAL INTELLIGENCE APPLICATION PERFORMANCE MANAGEMENT CLOUD BLOCKCHAIN Canada Czechia Slovakia Germany Bulgaria Russia USA Thailand Adastra Adastra Business Consulting Adastra One Acamar Ataccama Proboston Creative Blindspot AI Instarea 2000+ Employees
  • 5.
  • 6.
  • 9.
  • 10. 1985 - 1995 Prehistory •Controlled chaos •Best practice awaking •Manual scripting •Basic Relational Analytics 1995 – 2005 Antiquity •Clash of Titans: Kimball vs. Inmon •Mature Best practices •Enterprise Data Warehouse •ETL •OLAP •Complex Relational Analytics 2005-2015 Middle Ages •High-end traditional Data Warehousing •Hub-and-spoke architecture •Data Governance •MDM (mechanically divided meat) •MDD •ELT •Data Vault •Data Mining •DW appliance •Columnar DB •In-memory DB •Hadoop Stack Dawn •Unstructured data analytics 2015 – Modern Age •Logical Data Warehouse •Extended Data Warehouse •Data Lake •Polyglot Architecture •Kappa / Lambda •Databus •Data Pipeline •Real-time •Big Data ETL •Open Source •Big Data Analytics •Self-service •Data Science •Machine Learning & AI •Hadoop without Hadoop •Stream Analytics •All data analytics •Data Management Platform •Cloud •Automation •Autonomous Technologies •New Elasticities of Compute & Storage •Serverless (incl. Serveless DW)
  • 11. Small & Midsize DW Midsize & Large DW Midsize & Large DW
  • 12.
  • 14. Bigger Data Volumes Semi-structured & Unstructured Data Solution Complexity Fast Data Legacy Data Warehouses Slow Cloud Adoption (incl. DWaaS) Tighter SLA & Low Performance Better Service Quality Regulatory Requests Increasing TCO Bad Time 2 Market Weird Technology Stacks Weak Interation with Enterprise Architecture Sterile Multitenant Solutions Technology First Data Lineage Obssesion Lack of DW experts Accumulate Technological Debts Non-actionable Analytics Too many Data Warehouses Divided Data Platforms (Silos) Low Added Value for Business No Raw Data Relaxed or Ovefitted Data Quality Missing Self Service Insufficient History No Automation Hard Coding No Agile Data Lake as DW Replacement Slow Provisioning Limited Elasticity Old Templates for Current Problems Bad Data Granularity One Processing Frequency Data Hoarding Megalomaniac Scope No Parallel Processing Limited Data History Missing Metadata Missing Documentation Missing Data Architecture Missing Design Standards Ineffective ETL Development Infinite cycle of insourcing & outsourcing Multivendor Competitions without Strategy Missing Data Strategy Siloed MDMs Data Stream Isolation Not Real Reference Data or only for DW No Business Analytics Above Unmanaged Data Variety and Variance No Real Data Governance Evil Database Dwarves
  • 15. https://learn.panoply.io/hubfs/Downloadable%20Content/Data_Warehouse_Trends_2019.pdf Opportunities Abound (DWs are rare in SME segment) Redshift is losing ground Complexity remains a significant ‘sore spot’ for data warehouse users Performance issues are also persistent
  • 16. PROPER TOOLS SIMPLIFY DATA WAREHOUSING (AND ALSO NASA’S EXPERIMENTS) 16
  • 17. 2012
  • 18. 2018
  • 19. HIGH PERFORMANCE IS MANDATORY (AND COOL)19
  • 20. DATA INTEGRATION ARCHITECTURE ETL vs. ELT Big Data ETL: ETL on Hadoop vs. ELT in Hadoop Lambda = Kappa + Batch Layer Kappa = Lambda – Batch Layer Polyglot / Big SQL Sources: Ericsson, Oracle, Software Advice
  • 21. Traditional Data Warehouse (DW) Data Lake (DL) Extended Data Warehouse (XDW) Data Structured Structured & Semi-Structured & Unstructured Structured & Semi-Structured & Unstructured Data Processing Processed Raw Processed & Raw Data Schema Schema-on-write Schema-on-read Schema-on-write & Schema-on-read Data Model Relational Object-based Relational & Object-based Data History Hierarchically archived No hierarchy Hierarchically archived & No hierarchy Agility Fixed configuration Reconfigured anytime as needed Fixed configuration Reconfigured anytime as needed Security Mature Maturing Mature Primary Users Data analysists & Business professionals Data Scientists Data analysists & Business professionals & Data scientists Technology RDBMS NoSQL DBMS Hadoop Other distributed storages RDBMS NoSQL DBMS Hadoop Other distributed storages Agility Low High Medium Added Value Medium Medium High Cost High Low Medium
  • 22. DATA WAREHOUSE VS. DATA LAKE: DIFFERENT TECHNOLOGIES BUT SAME RESULTS
  • 23. Application ServerWeb Server Pentaho Data Integration (Web Console) Adastra Workflow GUI Adastra Ref Books GUI Adastra Worflow Middleware Adastra Ref Books Middleware Pentaho Data Integration (Carte) Pentaho Data Integration (Repository) Adastra Worfklow for RDBMS Database Scheduler Adastra Ref Books Store Adastra ELT SAP PowerDesginer Adastra ELT & Workflow Code Generator External Worfklow Adastra Data Model Design Time Run Time RDBMS
  • 24. Data Source Data Warehouse Stream Processor Stage Database ETL/ELT Custom Development Big Data ETLCluster Filesytem Data Services Data Extractor File System Messaging Change Data Capture Clustered Stage DB NoSQL Many Ways from Data Sources to Data Warehouse (Real Example)
  • 25. On-Premises Applications Data Runtime Middleware OS Virtualization Servers Storage Networking IaaS (Infrastructure as a Service) Applications Data Runtime Middleware OS Virtualization Servers Storage Networking PaaS (Platform as a Service) Applications Data Runtime Middleware OS Virtualization Servers Storage Networking SaaS (Software as a Service) Applications Data Runtime Middleware OS Virtualization Servers Storage Networking
  • 26. ELASTICITY LEADS THE WAY (AND THIS IS NOT 737 MAX ON THE PICTURE) 26
  • 27. AUTOMATION & AUTONOMOUS TECHNOLIGIES ARE NEW BLACK 27
  • 28.
  • 29. DATA GOVERNANCE MUST BE COMPLEX Concepts Vision & Mission Guiding Principles Organization & Roles Business Rules Activities Scope Benefits & Goals Components Data Architecture Data Quality Data Integration Operations Security RDM & MDM Metadata Data Platform & BI Tools CASE Enteprise Metadata Repository Data Quality Tools QA Framework Workflow & Orchestration IDE Audit Log Resource Management RDBMS NoSQL Hadoop Integration tools Monitoring Source Code Repository Testing Tools Others Why What How
  • 30.
  • 31. Business Drivers • Digitalization • Smart Data incl. Single Customer View • Data Literacy • External Analytics Innovation • Quick Time-to-Market • Business Process Autonomous Optimization • Individual Personal Offers at Massive Scale Analytics • Single View of Facts • Augmented Analytics • Advanced Data Visualization • Predictive & Prescriptive Analytics • Collaborative Business Intelligence • Natural Language Processing • Self-Service BI & Data Preparation • Data Discovery • Data Science • Effective Advanced Analytics incl. Real-time • External Analytics Innovation • Embedded Analytics • Stream Analytics • Geospatial Analytics • Data Blending Governance & Architecture • Data Quality Management & Master Data Management • Holistic Data Governance incl. Big Data • Embracing Data Catalogs • Real Data Science Governance • Full Data Lifecycle Management • Metadata Integration incl. Real- time • Advanced Security & Audit • Data Warehouse Modernization (XDW) • Data Warehouse Automation (DWA) • Doom of Classical Data Warehousing (Hub & Spoke) • Lambda & Kappa Architecture • Data Lake 2.0 • Analytic Data Store 2.0 Technology • Serverless • Compute & Storage Divided Elasticities • Extreme Performance & Appliances • Polymorphic Data Models • Artificial Intelligence • Cloud Continuum • Edge Computing • Complex Metadata Driven Development • Data Management Platforms • Data & Processing Offloading • Data Virtualization & Data Federation (Polyglot Architecture) • Autonomous Technologies • Next Generation Sandboxing • In-memory • DWaaS • Digital Twins
  • 32.
  • 33. Bigger Data Volumes Semi-structured & Unstructured Data Solution Complexity Fast Data Legacy Data Warehouses Slow Cloud Adoption (incl. DWaaS) Tighter SLA & Low Performance Better Service Quality Regulatory Requests Increasing TCO Bad Time 2 Market Weird Technology Stacks Weak Interation with Enterprise Architecture Sterile Multitenant Solutions Technology First Data Lineage Obssesion Lack of DW experts Accumulate Technological Debts Non-actionable Analytics Too many Data Warehouses Divided Data Platforms (Silos) Low Added Value for Business No Raw Data Relaxed or Ovefitted Data Quality Missing Self Service Insufficient History No Automation Hard Coding No Agile Data Lake as DW Replacement Slow Provisioning Limited Elasticity Old Templates for Current Problems Bad Data Granularity One Processing Frequency Data Hoarding Megalomaniac Scope No Parallel Processing Limited Data History Missing Metadata Missing Documentation Missing Data Architecture Missing Design Standards Ineffective ETL Development Infinite cycle of insourcing & outsourcing Multivendor Competitions without Strategy Missing Data Strategy Siloed MDMs Data Stream Isolation Not Real Reference Data or only for DW No Business Analytics Above Unmanaged Data Variety and Variance No Real Data Governance Evil Database Dwarves
  • 34.
  • 35.
  • 36. TORONTO LONDON FRANKFURT PRAGUE BRATISLAVA SOFIA MOSCOW THANK YOU ! CONTACT ADASTRA GROUP CZECH REPUBLIC KAROLINSKÁ 654/2 186 00 PRAHA 8 +420 271 733 303 INFOCZ@ADASTRAGRP.COM WWW.ADASTRA.CZ https://www.linkedin.com/in/martin-b%C3%A9m-7a92089/