The Death of the
Star Schema
WEBINAR
2
Nick Jewell
Sr Director, Product Marketing
at Incorta
Technology Evangelist.
25+ years Analytics Expertise
in Computer-Aided Drug Design,
Financial Services & Consulting
dataIQ100 (2018,2020,2021)
DataKind Ambassador
Nick Jewell
Claudia Imhoff
Founder of Boulder BI Brain
Trust (BBBT)
A thought leader, visionary, and
practitioner, Claudia Imhoff, Ph.D., is
an internationally recognized expert
on analytics, business intelligence,
and the architectures to support these
initiatives.
Claudia Imhoff
Speakers
Pallavi Mishra
Sales Engineer at Incorta
Focused. Determined. Passionate. I
am a keen observer and a quick
learner. I have an inclination
towards leveraging the evolving
technology in conjunction with the
right business acumen to solve
complex problems.
Pallavi Mishra
1970’s - 2000’s
Relational Databases
• Good for highly structured data
• Simple and Reliable
• Good for small to medium data sets
3
“How much information is there in the
world”
1997 Michael Lesk
“There may be a few thousand petabytes of
information…we will be able to save
everything..no information thrown away…typical
information will never be looked at”
https://www.lesk.com/mlesk/ksg97/ksg.html
Rise of Internet 1990’s - early 2000’s
4
Rise in data (Doug Laney)
Volume: Clickstream
Velocity: High velocity transactions,
digitalisation, multi-channels
Variety:
• Structured
• Semi-Structured
• Unstructured
Telecommunication in optimally compressed MB
Agenda
Life of the Star Schema
Death of the Star Schema
Benefits of Eliminating Star Schemas
Getting Started
5
Life of the Star Schema
6
Genesis of the Star Schema
7
The Data Warehouse
Era begins
• Contains integrated data
from multiple sources
• Sole purpose was decision
support
Relational DBMS
technology used Date-
Codd rules for data design
• Most efficient way to store data
• Least efficient performance for
multi-join queries
Enter the Star Schema:
a database design that
mirrors the business
• Allows the business
community to ask many
questions
• And get reasonable response
times
It’s the 80’s!
Genesis of the Star Schema
8
Star Schema – a physical
instantiation of a multi-join
process
• Significant data denormalization process
to improve join performance
• Fact table surrounded by dimension
tables
• Great way to perform multi-dimensional
analysis…
• As long as analytical processes
or data never change…
Time_ID
Product_ID
Program_ID
Location_ID
Customer_ID
Order_ID
etc.
----------------
Counts
Usage
Dollars Customer
Order
Location
Time
Product
Channel
Program
Campaign
Difficulties Develop…
…As long as the
analytical processes or
data never change…
• But they do – They are
unpredictable, fluid,
always changing
The result?
• Slowly changing dimensional
maintenance skyrockets
• Need for new dimensions
constantly
• Need for new (mostly
redundant) star schemas
• Loss of flexibility and agility!
Analytical environments
become nightmares of
complexity
Business community
is not amused…
9
Death of the Star Schema
10
Hurrah for technological advances!
1. Cloud storage of data
2. In-memory
3. New query engines
Today
There Must Be A Better Way!
=
11
Data is stored in the cloud
(Parquet)
First Leg: Columnar Storage of Data
Much reduced costs (elasticity of cloud
implementations)
Data storage orchestration over different
storage formats
• RAM (Random Access Memory)
• SSD (Solid State Drive)
• HDD (Spinning Discs)
Optimization that improves performance by
better I/O, usage of query engines,
columnar/in-memory storage
12
Most recently, reduced costs of
memory mean data can now
reside there rather than on disk
Second Leg: In-Memory
• Optimizes performance for queries by
eliminating requests to disk-stored
data
• Improves scalability with decreased
cost of memory
13
New query engines are what
make star schemas irrelevant
Third Leg: New Query Engines
• These engines that provide real-
time joins between complex data
tables = virtual star schemas
• They create the needed
aggregations at the same
• This yields much-needed flexibility
in number of queries resolved
From: www.biodataanalysis.de
14
With all three legs in place, a
star schema is replaced easily
Death of the Star Schema
From: www.newsweek.com 15
• Data is quickly ingested and integrated
• ETL process is simplified by removing
star schema creation/maintenance
from it
• Data from many complex data tables is
quickly joined and presented
• For example, a fact joined to a fact is
almost impossible to do in star schema
implementations
• With the improved performance as
discussed, this is now possible!
Benefits of Eliminating Star Schemas
16
Benefits of Star Schema-less Environment
17
Individualized
Reusable
Artistic
Experimental
Industrial
Built-for-purpose
based on users
and queries
Benefits of Star Schema-less Environment
18
Flexibility and agility return to
the data warehouse
environment
• Business users can ask impromptu
questions – with virtually unlimited
dimensionality
• They can use much more complex,
detailed data
• All while receiving better response times
Maintenance is greatly
simplified!
• Design sessions are reduced
• ETL is simplified
• Maintenance is lessened
Data storage
requirements are
reduced
• Columnar storage
compresses the data
• No indexes are needed
Developers are freed up to
do more valuable activities
than maintaining star
schemas
• They can focus on increased
availability and volumes of new
data sources
• They can focus on more advanced
forms of analyses and
experimentation capabilities
Re-evaluating star
schemas can uncover
unknown errors
19
Benefits of Star Schema-less Environment
Getting Started
20
21
Getting Started
Many organizations have
“legacy” data warehouses. If
so, here are the steps to use in
migrating to a star schema-
less environment:
01
Evaluate your ETL processes
• Determine where the star schema
bottlenecks are
• Decide which star schemas are
particularly burdensome in terms of
creation/maintenance
• Target these for migration
22
Getting Started
03
Begin analyzing the detailed
data from which the star
schema was developed
• This data can add even more flexibility
and agility to the overall environment
• You may discover errors in previous
implementations
• It’s also a quick win for developers &
business users
02
Group selected star schemas by
the business problems they solve
• Prioritize those business problem stars as to
their criticality, maintenance difficulty,
requests for updates
• Each grouping may become its own project
• This gives you a clear path forward
23
Getting Started
05
Expand data acquisition
horizons
• There is data that you might have
thought was beyond your development
capabilities
• BUT data volumes, query performance,
and time to delivery are not big
problems now
04
Create a migration path
• Move the set of star schema data for each
business problem into the new environment
according to the priority schedule
• Quick win!
24
Getting Started
07
If you have a green field situation
– lucky you!
• You still need to understand the business
users’ needs but go beyond those needs
and embellish
• You still need to determine how much ETL
and data quality processing will be required
• Matthew will talk about a new approach to
analytics in the next section
06
Life is good!
• Reduced burden of star schema design,
creation, & maintenance means freed up time
for development
• Use that time to begin reducing backlogs of
analytical requests
Summary
25
Given the advances in analytical
technologies, it is time to rethink
data warehouse design and
processes
• You still need the star schema design phase as
a mandatory step
• You still need a repository of analytical data
• You still need ETL or some form of data
integration and quality processes BUT less of it
• You still need to perform maintenance on the
stored data BUT there is less of it, no indexes,
and simpler data schemas
You can now solve many of the
past, difficult problems
• By bringing in better, faster, and more flexible
decision-making into your organization
From:
LifeIsGood.com
Star Schemas in the Real World
Powerful Insights … but with a huge supporting cast
26
“Modern” Data Architecture
A Complex and Inflexible Nightmare That Limits Insights from Perishable Data
BUSINESS
DATA
SOURCES
Sources
HUMAN RESOURCES
FINANCE
SUPPLY CHAIN
Tools
RAW DATA
ZONE
Data Lake
REFINED
DATA ZONE
Data
Warehouses
BUSINESS
DATA ZONE
Star Schemas
Transform
25%
Extract
100%
Aggregate
10%
27
© Incorta, Inc. All Rights Reserved Internal Use Only
Data Ingest/
Loading
Querying 3NF /
Bronze Data
29
Do it all again for every new question
Question! New Data?
Weeks of work
Call IT
Get on a list
Transform Data
Lots of SQL/ETL
Prep Data
Cubes & Marts
Ready!
Only a few
weeks later!
THE
“MODERN”
WAY
Bringing data
to BI
THE AGILE
WAY
Bringing BI to
the data Question! I see it already
and I can
load it myself
New insights
within minutes
Data Architecture to Transform Business
What Changes When You Deliver 100% of Your Data for Analytics
30
Incorta Unified Data & Analytics Platform
Data Enrichment
Data Science
Notebooks
Custom
Logic
Materialized
Views
Machine
Learning
Spark Cluster
Advanced Analytics & Machine Learning
Data Acquisition
Connectors Parallel
Data
Loader
Schema
Detection
Direct
Data
Mapping
LOADER SERVICE
Shared Storage
Metadata Admin
Parquet
Columnar
Storage
Direct
Data
Map
Data Acquisition
31
Incorta Unified Data & Analytics Platform
Connectors Parallel
Data
Loader
Schema
Detection
Direct
Data
Mapping
LOADER SERVICE
Data Enrichment
Data Science
Notebooks
Custom
Logic
Materialized
Views
Machine
Learning
Spark Cluster
Advanced Analytics & Machine Learning
Shared Storage
Metadata Admin
Parquet
Columnar
Storage
Direct
Data
Map
Data Analytics
In-Memory
Analytics
Engine
ANALYTICS SERVICE
Business
Views,
Security
Data Visualization
SQL / Open Access
“Data Architecture…
…defines the blueprint for managing data assets
by aligning with organizational strategy…”
Aligning Data Architecture to Business Needs
Data Management Body of Knowledge Definition
32
33
Blueprints Provide a Huge Head Start
Pre-Built Dashboard and Schemas Get You Up and Running Quickly on Enterprise Data
Raw tables Helper tables
Aggregated
Business
Views
Blueprints
Business
logic
Essential Components for Modern Data Architecture
From Raw Data to Actionable Insights
34
Demo
Q&A
SEE YA LATER
STAR SCHEMA
Find out why the world’s most valuable companies rely
on Incorta to acquire, enrich, analyze and act on data
with unmatched speed.
START YOUR CLOUD TRIAL TODAY
cloud.incorta.com/signup
The Direct Data Platform™

The Death of the Star Schema

  • 1.
    The Death ofthe Star Schema WEBINAR
  • 2.
    2 Nick Jewell Sr Director,Product Marketing at Incorta Technology Evangelist. 25+ years Analytics Expertise in Computer-Aided Drug Design, Financial Services & Consulting dataIQ100 (2018,2020,2021) DataKind Ambassador Nick Jewell Claudia Imhoff Founder of Boulder BI Brain Trust (BBBT) A thought leader, visionary, and practitioner, Claudia Imhoff, Ph.D., is an internationally recognized expert on analytics, business intelligence, and the architectures to support these initiatives. Claudia Imhoff Speakers Pallavi Mishra Sales Engineer at Incorta Focused. Determined. Passionate. I am a keen observer and a quick learner. I have an inclination towards leveraging the evolving technology in conjunction with the right business acumen to solve complex problems. Pallavi Mishra
  • 3.
    1970’s - 2000’s RelationalDatabases • Good for highly structured data • Simple and Reliable • Good for small to medium data sets 3 “How much information is there in the world” 1997 Michael Lesk “There may be a few thousand petabytes of information…we will be able to save everything..no information thrown away…typical information will never be looked at” https://www.lesk.com/mlesk/ksg97/ksg.html
  • 4.
    Rise of Internet1990’s - early 2000’s 4 Rise in data (Doug Laney) Volume: Clickstream Velocity: High velocity transactions, digitalisation, multi-channels Variety: • Structured • Semi-Structured • Unstructured Telecommunication in optimally compressed MB
  • 5.
    Agenda Life of theStar Schema Death of the Star Schema Benefits of Eliminating Star Schemas Getting Started 5
  • 6.
    Life of theStar Schema 6
  • 7.
    Genesis of theStar Schema 7 The Data Warehouse Era begins • Contains integrated data from multiple sources • Sole purpose was decision support Relational DBMS technology used Date- Codd rules for data design • Most efficient way to store data • Least efficient performance for multi-join queries Enter the Star Schema: a database design that mirrors the business • Allows the business community to ask many questions • And get reasonable response times It’s the 80’s!
  • 8.
    Genesis of theStar Schema 8 Star Schema – a physical instantiation of a multi-join process • Significant data denormalization process to improve join performance • Fact table surrounded by dimension tables • Great way to perform multi-dimensional analysis… • As long as analytical processes or data never change… Time_ID Product_ID Program_ID Location_ID Customer_ID Order_ID etc. ---------------- Counts Usage Dollars Customer Order Location Time Product Channel Program Campaign
  • 9.
    Difficulties Develop… …As longas the analytical processes or data never change… • But they do – They are unpredictable, fluid, always changing The result? • Slowly changing dimensional maintenance skyrockets • Need for new dimensions constantly • Need for new (mostly redundant) star schemas • Loss of flexibility and agility! Analytical environments become nightmares of complexity Business community is not amused… 9
  • 10.
    Death of theStar Schema 10
  • 11.
    Hurrah for technologicaladvances! 1. Cloud storage of data 2. In-memory 3. New query engines Today There Must Be A Better Way! = 11
  • 12.
    Data is storedin the cloud (Parquet) First Leg: Columnar Storage of Data Much reduced costs (elasticity of cloud implementations) Data storage orchestration over different storage formats • RAM (Random Access Memory) • SSD (Solid State Drive) • HDD (Spinning Discs) Optimization that improves performance by better I/O, usage of query engines, columnar/in-memory storage 12
  • 13.
    Most recently, reducedcosts of memory mean data can now reside there rather than on disk Second Leg: In-Memory • Optimizes performance for queries by eliminating requests to disk-stored data • Improves scalability with decreased cost of memory 13
  • 14.
    New query enginesare what make star schemas irrelevant Third Leg: New Query Engines • These engines that provide real- time joins between complex data tables = virtual star schemas • They create the needed aggregations at the same • This yields much-needed flexibility in number of queries resolved From: www.biodataanalysis.de 14
  • 15.
    With all threelegs in place, a star schema is replaced easily Death of the Star Schema From: www.newsweek.com 15 • Data is quickly ingested and integrated • ETL process is simplified by removing star schema creation/maintenance from it • Data from many complex data tables is quickly joined and presented • For example, a fact joined to a fact is almost impossible to do in star schema implementations • With the improved performance as discussed, this is now possible!
  • 16.
    Benefits of EliminatingStar Schemas 16
  • 17.
    Benefits of StarSchema-less Environment 17 Individualized Reusable Artistic Experimental Industrial Built-for-purpose based on users and queries
  • 18.
    Benefits of StarSchema-less Environment 18 Flexibility and agility return to the data warehouse environment • Business users can ask impromptu questions – with virtually unlimited dimensionality • They can use much more complex, detailed data • All while receiving better response times Maintenance is greatly simplified! • Design sessions are reduced • ETL is simplified • Maintenance is lessened
  • 19.
    Data storage requirements are reduced •Columnar storage compresses the data • No indexes are needed Developers are freed up to do more valuable activities than maintaining star schemas • They can focus on increased availability and volumes of new data sources • They can focus on more advanced forms of analyses and experimentation capabilities Re-evaluating star schemas can uncover unknown errors 19 Benefits of Star Schema-less Environment
  • 20.
  • 21.
    21 Getting Started Many organizationshave “legacy” data warehouses. If so, here are the steps to use in migrating to a star schema- less environment: 01 Evaluate your ETL processes • Determine where the star schema bottlenecks are • Decide which star schemas are particularly burdensome in terms of creation/maintenance • Target these for migration
  • 22.
    22 Getting Started 03 Begin analyzingthe detailed data from which the star schema was developed • This data can add even more flexibility and agility to the overall environment • You may discover errors in previous implementations • It’s also a quick win for developers & business users 02 Group selected star schemas by the business problems they solve • Prioritize those business problem stars as to their criticality, maintenance difficulty, requests for updates • Each grouping may become its own project • This gives you a clear path forward
  • 23.
    23 Getting Started 05 Expand dataacquisition horizons • There is data that you might have thought was beyond your development capabilities • BUT data volumes, query performance, and time to delivery are not big problems now 04 Create a migration path • Move the set of star schema data for each business problem into the new environment according to the priority schedule • Quick win!
  • 24.
    24 Getting Started 07 If youhave a green field situation – lucky you! • You still need to understand the business users’ needs but go beyond those needs and embellish • You still need to determine how much ETL and data quality processing will be required • Matthew will talk about a new approach to analytics in the next section 06 Life is good! • Reduced burden of star schema design, creation, & maintenance means freed up time for development • Use that time to begin reducing backlogs of analytical requests
  • 25.
    Summary 25 Given the advancesin analytical technologies, it is time to rethink data warehouse design and processes • You still need the star schema design phase as a mandatory step • You still need a repository of analytical data • You still need ETL or some form of data integration and quality processes BUT less of it • You still need to perform maintenance on the stored data BUT there is less of it, no indexes, and simpler data schemas You can now solve many of the past, difficult problems • By bringing in better, faster, and more flexible decision-making into your organization From: LifeIsGood.com
  • 26.
    Star Schemas inthe Real World Powerful Insights … but with a huge supporting cast 26
  • 27.
    “Modern” Data Architecture AComplex and Inflexible Nightmare That Limits Insights from Perishable Data BUSINESS DATA SOURCES Sources HUMAN RESOURCES FINANCE SUPPLY CHAIN Tools RAW DATA ZONE Data Lake REFINED DATA ZONE Data Warehouses BUSINESS DATA ZONE Star Schemas Transform 25% Extract 100% Aggregate 10% 27
  • 28.
    © Incorta, Inc.All Rights Reserved Internal Use Only Data Ingest/ Loading Querying 3NF / Bronze Data
  • 29.
    29 Do it allagain for every new question Question! New Data? Weeks of work Call IT Get on a list Transform Data Lots of SQL/ETL Prep Data Cubes & Marts Ready! Only a few weeks later! THE “MODERN” WAY Bringing data to BI THE AGILE WAY Bringing BI to the data Question! I see it already and I can load it myself New insights within minutes Data Architecture to Transform Business What Changes When You Deliver 100% of Your Data for Analytics
  • 30.
    30 Incorta Unified Data& Analytics Platform Data Enrichment Data Science Notebooks Custom Logic Materialized Views Machine Learning Spark Cluster Advanced Analytics & Machine Learning Data Acquisition Connectors Parallel Data Loader Schema Detection Direct Data Mapping LOADER SERVICE Shared Storage Metadata Admin Parquet Columnar Storage Direct Data Map
  • 31.
    Data Acquisition 31 Incorta UnifiedData & Analytics Platform Connectors Parallel Data Loader Schema Detection Direct Data Mapping LOADER SERVICE Data Enrichment Data Science Notebooks Custom Logic Materialized Views Machine Learning Spark Cluster Advanced Analytics & Machine Learning Shared Storage Metadata Admin Parquet Columnar Storage Direct Data Map Data Analytics In-Memory Analytics Engine ANALYTICS SERVICE Business Views, Security Data Visualization SQL / Open Access
  • 32.
    “Data Architecture… …defines theblueprint for managing data assets by aligning with organizational strategy…” Aligning Data Architecture to Business Needs Data Management Body of Knowledge Definition 32
  • 33.
    33 Blueprints Provide aHuge Head Start Pre-Built Dashboard and Schemas Get You Up and Running Quickly on Enterprise Data Raw tables Helper tables Aggregated Business Views Blueprints Business logic
  • 34.
    Essential Components forModern Data Architecture From Raw Data to Actionable Insights 34
  • 35.
  • 36.
  • 37.
    SEE YA LATER STARSCHEMA Find out why the world’s most valuable companies rely on Incorta to acquire, enrich, analyze and act on data with unmatched speed. START YOUR CLOUD TRIAL TODAY cloud.incorta.com/signup
  • 38.
    The Direct DataPlatform™