SlideShare a Scribd company logo
SQL Bits X

Temporal Snapshot Fact
Table
A new approach to data snapshots
Davide Mauri
dmauri@solidq.com
Mentor - SolidQ
EXEC sp_help „Davide Mauri‟
• Microsoft SQL Server MVP
• Works with SQL Server from 6.5, on BI from
  2003
• Specialized in Data Solution Architecture,
  Database Design, Performance Tuning, BI
• President of UGISS (Italian SQL Server UG)
• Mentor @ SolidQ
• Twitter: mauridb
• Blog: http://sqlblog.com/blogs/davide_mauri
Agenda
• The problem
• The possible «classic» solutions
• Limits of «classic» solutions
• A temporal approach
• Implementing the solution
• Technical & Functional Challenges
     (and their resolution )
• Conclusions
The Problem
The request
• Our customer (an insurance company)
  needed a BI solution that would allow
  them to do some “simple” things:
  – Analyze the value/situation of all insurances
    at a specific point in time
  – Analyze all the data from 1970 onwards
  – Analyze data on daily basis
    • Absolutely no weekly or monthly aggregation
The environment
• On average they have 3.000.000
  documents related to insurances that need
  to be stored.
  – Each day.
• Data is mainly stored in a DB2 mainframe
  – Data is (sometimes) stored in a “temporal”
    fashion
    • For each document a new row is created each
      time something changes
    • Each row has “valid_from” and “valid_to” columns
Let‟s do the math
• To keep the daily snapshot
  – of 3.000.000 documents
  – for 365 days
  – for 40 years


• A fact table of near 44 BILLION rows
  would be needed
The possible solutions
• How can the problem be solved?
  – PDW and/or Brute Force was not an option


• Unfortunately none of the three known
  approaches can work here
  – Transactional Fact Table
  – Periodic Snapshot Fact Table
  – Accumulating Snapshot Fact Table


• A new approach is needed!
The classic solutions
 And why they cannot be used here
Transaction Fact Table
• One row per fact occurring at a certain point
  in time
• We don‟t have facts for every day. For
  example, for the document 123, no changes
  were made on 20 August 2011
  – Analysis on that date would not show the
    document 123
  – “LastNonEmpty” aggregation cannot help since
    dates don‟t have children
  – MDX can come to the rescue here but…
     • Very high complexity
     • Potential performance problems
Accumulating Snapshot F.T.
• One fact row per entity in the fact table
  – Each row has a lot of date columns that store
    all the changes that happen during the entity‟s
    lifetime
  – Each column date represents a point in time
    where something changed
     • The number of changes must be known in
       advance
• With so many date columns it can be
  difficult to manage the “Analysis Date”
                                                  11
Periodic Snapshot Fact Table
• The fact table holds a «snapshot» of all
  data for a specific point in time
     – That‟s the perfect solution, since doing a
       snapshot each day would completely solve
       the problem
     – Unfortunately there is just too much data
     – We need to keep the idea but reduce the
       amount of data



BIA-406-S | Temporal Snapshot Fact Table            12
A temporal approach
 And how it can be applied to BI
Temporal Data
• The snapshot fact table is a good starting
  point
  – Keeping a snapshot of a document for each
    day is just a waste of space
  – And also a big performance killer
• Changes to document don‟t happen too
  often
  – Fact tables should be temporal in order to
    avoid data duplication
  – Ideally, each fact row needs a “valid_from”
    and “valid_to” column
Temporal Data
• With Temporal Snapshot Fact Table
  (TSFT) each row represents a fact that
  occurred during a time interval not at a
  point in time.
• Time Intervals will be right-opened in order
  to simplify calculations:
            [valid_from, valid_to)
      Which means:
            valid_from <= x < valid_to
Temporal Data
• Now that the idea is set, we have to solve
  several problems:
  – Functional Challenges
     • SSAS doesn‟t support the concept of intervals, nor
       the between filter
     • The user just wants to define the Analysis Date,
       and doesn‟t want to deal with ranges
  – Technical Challenges
     • Source data may come from several tables and
       has to be consolidated in a small number of TSFT
     • The concept of interval changes the rules of the
       “join” game a little bit…
Technical Challenges
• Source data may come from table with
  different designs:
      temporal tables
      non-temporal tables
• Temporal Tables
  – Ranges from two different temporal tables
    may overlap
• Non-temporal tables
  – There may be some business rules that
    require us to «break» the existing ranges
Technical Challenges
• Before starting to solve the problem, let‟s
  generalize it a bit (in order to be able to study
  it theoretically)
• Imagine a company with this environment
  – There are Items & Sub Items
  – Items have 1 or more Sub Items
     • Each Sub Item has a value in money
     • Value may change during its lifetime
  – Customers have to pay an amount equal to the
    sum of all Sub Items of an Item
     • Customers may pay when they want during the year,
       as long as the total amount is paid by the end of the
       year
Data Diving
Technical Challenges
• When joining data, keys are no longer
  sufficient
• Time Intervals must also be managed
  correctly, otherwise incorrect results may
  be created
• Luckily time operators are well-known (but
  not yet implemented)
  – overlaps, contains, meets, ecc. ecc.
Technical Challenges
• CONTAINS (or DURING) Operator:

             b1                  e1

                       i1
                  b2        e2

                       i2

• Means:
           b1<=b2 AND e1>=e2
Technical Challenges
• OVERLAPS Operator:
           b1             e1



                i1
                     b2             e2



                               i2


• Means:
            b1<=e2 AND b2 <=e1
• Since we‟re using right-opened intervals, a
  little change is needed here:
            b1<e2 AND b2 <e1
Technical Challenges
• Before going any further it‟s best we start
  to «see» the data we‟re going to use


                      2008                                                  2009                                                           2010

                     Nov       Dec   Jan   Feb       Mar       Apr   May   Jun   Jul       Aug   Sep       Oct       Nov   Dec       Jan       Feb       Mar

 Item 1                    A                                                           B                                                                 C

          Item 1.1         A               B                         C                                 D                                   B             C

          Item 1.2         A                               B                               C                     D               E                   C

 Item 2                        A                 B                               C                               D                         C
          Item 2.1             A                 B                               C                               D                         C
JOINs & Co.
Technical Challenges
• As you have seen, even with just three
  tables, the joins are utterly complex!

• But bearing this in mind, we can now find
  an easier, logically equivalent, solution.

• Drawing a timeline and visualizing the
  result helps a lot here.
Technical Challenges


  Range for one Item


  Range for the Sub Item


  Payment Dates                           Intervals will represent a
So, on the other way round,
                                             time where nothing
 dates represent a time in
                                                  changed
which something changed!



  «Summarized» Status
                              1   2   3     4          5       6
Technical Challenges
• In simpler words we need to find the
  minimum set of time intervals for each
  entity
  – The concept of "granularity" must be changed
    in order to take into account the idea of time
    intervals
  – All fact tables, for each stored entity, must
    have the same granularity for the time interval
     • interval granularity must be the same “horizontally”
       (across tables)
     • interval granularity must be specific for each entity
       (across rows: “vertically”)
Technical Challenges
• The problem can be solved by
  «refactoring» data.
• Just like refactoring code, but working on
  data
  – Rephrased idea: “disciplined technique for
    restructuring existing temporal data, altering
    its internal structure without changing its
    value”
     • http://en.wikipedia.org/wiki/Code_refactoring
Technical Challenges
• Here‟s how:
• Step 1
  – For each item, for
    each temporal table,
    union all the distinct
    ranges
Technical Challenges
• Step 2
  – For each item, take all the
    distinct:
  – Dates from the previous step
  – Dates that are known to
    «break» a range
  – The first day of each year
    used in ranges
Technical Challenges
• Step 3
  – For each item, create
    the new ranges, doing
    the following steps
     • Order the rows by date
     • Take the «k» row and
       the «k+1» row
     • Create the range
            [Datek, Datek+1)

  The New LEAD operator is exactly
  what we need! 
Technical Challenges
• When the time granularity is the day, the
  right-opened range helps a lot here.
  – Eg: let‟s say the at some point, for an entity,
    you have the following dates that has to be
    turned into ranges:
Technical Challenges
• With a closed range intervals can be
  ambiguous, especially single day intervals:




  – The meaning is not explicitly clear and we
    would have to take care of it, making the ETL
    phase more complex.
  – It‟s better to make everything explicit
Technical Challenges
• So, to avoid ambiguity, with a closed range
  the resulting set of intervals also needs to
  take time into account:




  – What granularity to use for time?
     • Hour? Minute? Microseconds?
  – Is just a waste of space
  – Date or Int datatype cannot be used
Technical Challenges
• With a right-opened interval everything is
  easier:




  – Date or Int datatype are enough
  – No time used, so no need to deal with it
Technical Challenges
• Step 4
  – «Explode» all the temporal source tables so
    that all data will use the same (new) ranges,
    using the CONTAINS operator:
Technical Challenges
• The original Range:


Becomes the equivalent
set of ranges:



If we would have chosen a daily
snapshot, 425 rows would have been
generated…not only 12!
(That‟s a 35:1 ratio!)
Technical Challenges
• Now we have all the source data
  conforming to a common set of intervals
  – This means that we can just join tables
    without having to do complex temporal joins


• All fact tables will also have the same
  interval granularity

• We solved all the technical problems! 
SSIS In Action
Functional Challenges
• SSAS doesn‟t support time intervals but…

• Using some imagination we can say that
  – An interval is made of 1 to “n” dates
  – A date belongs to 1 to “n” intervals


• We can model this situation with a Many-
  To-Many relationship!
Functional Challenges
• The «interval dimension» will then be
  hidden and the user will just see the
  «Analysis Date» dimension

• When such date is selected, all the fact
  rows that have an interval containing that
  date will be selected too, giving us the
  solution we‟re looking for 
Functional Challenges

   Date      The user wants
 Dimension   to analyze data
             on 10 Aug 2009



                                                   Fact Table
    Factless                   DateRange
Date-DateRange                 Dimension

                                SSAS uses all
        10 Aug is               the found ranges
        contained in two                            All the (three)
        Ranges                                      rows related to
                                                    those ranges are
                                                    read
All Togheter!
The final recap
 And a quick recipe to make everything easier
Steps for the DWH
• For each entity:
  – Get all “valid_from” and “valid_to” columns
    from all tables
  – Get all the dates that will break intervals from
    all tables
  – Take all the gathered dates one time (remove
    duplicates) and generate new intervals
  – «Unpack» original tables in order to use the
    newly generated intervals
Steps for the CUBE
• Generate the usual Date dimension
  – Generate a «Date Range» dimension
  – Generate a Factless Date-DataRange table
  – Generate the fact table with reference to the
    Date Range dimension
  – Create the M:N relationship
BIA-406-S | Temporal Snapshot Fact Table   47
Conclusion and Improvements
 Some ideas for the future
Conclusion and improvements
• The defined pattern can be applied each time
  a daily analysis is needed and data is not
  additive
  – So each time you need to “snapshot” something

• The approach can also be used if source
  data is not temporal, as long as you can turn
  it into a temporal format

• Performance can be improved by
  partitioning the cube by year or even by
  month
Conclusion and improvements
• At the end is a very simple solution 

• Everything you‟ve seen is the summary of
  several months of work of the Italian
  Team. Thanks guys!
Questions?
Thanks!




© 2012 SolidQ   52

More Related Content

What's hot

Cloud Monitoring
Cloud MonitoringCloud Monitoring
Cloud Monitoring
Ernest Mueller
 
Business Process Model and Notation (BPMN)
Business Process Model and Notation (BPMN)Business Process Model and Notation (BPMN)
Business Process Model and Notation (BPMN)
Peter R. Egli
 
Business Analyst Job Course.pptx
Business Analyst Job Course.pptxBusiness Analyst Job Course.pptx
Business Analyst Job Course.pptx
Rohit Dubey
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
Eric Kavanagh
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
 
Reliability growth models
Reliability growth modelsReliability growth models
Reliability growth models
Roy Antony Arnold G
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Object oriented-systems-development-life-cycle ppt
Object oriented-systems-development-life-cycle pptObject oriented-systems-development-life-cycle ppt
Object oriented-systems-development-life-cycle ppt
Kunal Kishor Nirala
 
OOSE-UNIT-1.pptx
OOSE-UNIT-1.pptxOOSE-UNIT-1.pptx
OOSE-UNIT-1.pptx
yashwanthbellamconda
 
IntroduzioneAllaGestioneDiUnProgettoSoftwareConUML
IntroduzioneAllaGestioneDiUnProgettoSoftwareConUMLIntroduzioneAllaGestioneDiUnProgettoSoftwareConUML
IntroduzioneAllaGestioneDiUnProgettoSoftwareConUML
matteo_gentile
 
Microsoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow ScenariosMicrosoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow Scenarios
Mark Kromer
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
pcherukumalla
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Date and Timestamp Types In Snowflake (By Faysal Shaarani)
Date and Timestamp Types In Snowflake (By Faysal Shaarani)Date and Timestamp Types In Snowflake (By Faysal Shaarani)
Date and Timestamp Types In Snowflake (By Faysal Shaarani)
Faysal Shaarani (MBA)
 
Lecture 14 requirements modeling - flow and behavior
Lecture 14   requirements modeling - flow and  behaviorLecture 14   requirements modeling - flow and  behavior
Lecture 14 requirements modeling - flow and behavior
IIUI
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 
Business Process Management and Information Technology
Business Process Management and Information TechnologyBusiness Process Management and Information Technology
Business Process Management and Information Technology
Ashish Desai
 
Software testing - Risk management
Software testing - Risk managementSoftware testing - Risk management
Software testing - Risk management
PractiTest
 

What's hot (20)

Cloud Monitoring
Cloud MonitoringCloud Monitoring
Cloud Monitoring
 
Business Process Model and Notation (BPMN)
Business Process Model and Notation (BPMN)Business Process Model and Notation (BPMN)
Business Process Model and Notation (BPMN)
 
Business Analyst Job Course.pptx
Business Analyst Job Course.pptxBusiness Analyst Job Course.pptx
Business Analyst Job Course.pptx
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Reliability growth models
Reliability growth modelsReliability growth models
Reliability growth models
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Object oriented-systems-development-life-cycle ppt
Object oriented-systems-development-life-cycle pptObject oriented-systems-development-life-cycle ppt
Object oriented-systems-development-life-cycle ppt
 
OOSE-UNIT-1.pptx
OOSE-UNIT-1.pptxOOSE-UNIT-1.pptx
OOSE-UNIT-1.pptx
 
IntroduzioneAllaGestioneDiUnProgettoSoftwareConUML
IntroduzioneAllaGestioneDiUnProgettoSoftwareConUMLIntroduzioneAllaGestioneDiUnProgettoSoftwareConUML
IntroduzioneAllaGestioneDiUnProgettoSoftwareConUML
 
Microsoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow ScenariosMicrosoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow Scenarios
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Date and Timestamp Types In Snowflake (By Faysal Shaarani)
Date and Timestamp Types In Snowflake (By Faysal Shaarani)Date and Timestamp Types In Snowflake (By Faysal Shaarani)
Date and Timestamp Types In Snowflake (By Faysal Shaarani)
 
Lecture 14 requirements modeling - flow and behavior
Lecture 14   requirements modeling - flow and  behaviorLecture 14   requirements modeling - flow and  behavior
Lecture 14 requirements modeling - flow and behavior
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
Business Process Management and Information Technology
Business Process Management and Information TechnologyBusiness Process Management and Information Technology
Business Process Management and Information Technology
 
Software testing - Risk management
Software testing - Risk managementSoftware testing - Risk management
Software testing - Risk management
 

Viewers also liked

Data modeling facts
Data modeling factsData modeling facts
Data modeling facts
Dr. Dipti Patil
 
SQL Server 2016 Temporal Tables
SQL Server 2016 Temporal TablesSQL Server 2016 Temporal Tables
SQL Server 2016 Temporal Tables
Davide Mauri
 
SQL Server 2016 What's New For Developers
SQL Server 2016  What's New For DevelopersSQL Server 2016  What's New For Developers
SQL Server 2016 What's New For Developers
Davide Mauri
 
Dw design fact_tables_types_6
Dw design fact_tables_types_6Dw design fact_tables_types_6
Dw design fact_tables_types_6
Claudia Gomez
 
A 100 dal genocidio georges ruyssen civiltà cattolica
A 100 dal genocidio georges ruyssen   civiltà cattolicaA 100 dal genocidio georges ruyssen   civiltà cattolica
A 100 dal genocidio georges ruyssen civiltà cattolica
Francesco Occhetta
 
Datarace: IoT e Big Data (Italian)
Datarace: IoT e Big Data (Italian)Datarace: IoT e Big Data (Italian)
Datarace: IoT e Big Data (Italian)
Davide Mauri
 
Iris Multi-Class Classifier with Azure ML
Iris Multi-Class Classifier with Azure MLIris Multi-Class Classifier with Azure ML
Iris Multi-Class Classifier with Azure ML
Davide Mauri
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
Davide Mauri
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Allen Woods
 
Data flow ii extract
Data flow   ii extractData flow   ii extract
Data flow ii extract
Dr. Dipti Patil
 
Real Time Power BI
Real Time Power BIReal Time Power BI
Real Time Power BI
Davide Mauri
 
AzureML - Creating and Using Machine Learning Solutions (Italian)
AzureML - Creating and Using Machine Learning Solutions (Italian)AzureML - Creating and Using Machine Learning Solutions (Italian)
AzureML - Creating and Using Machine Learning Solutions (Italian)
Davide Mauri
 
SSIS Monitoring Deep Dive
SSIS Monitoring Deep DiveSSIS Monitoring Deep Dive
SSIS Monitoring Deep Dive
Davide Mauri
 
Data modeling dimensions
Data modeling dimensionsData modeling dimensions
Data modeling dimensions
Dr. Dipti Patil
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
Davide Mauri
 
Research methodology
Research methodologyResearch methodology
Research methodology
Dr. Dipti Patil
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Qubole
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
Daniel Upton
 
Dashboarding with Microsoft: Datazen & Power BI
Dashboarding with Microsoft: Datazen & Power BIDashboarding with Microsoft: Datazen & Power BI
Dashboarding with Microsoft: Datazen & Power BI
Davide Mauri
 
Azure Machine Learning (Italian)
Azure Machine Learning (Italian)Azure Machine Learning (Italian)
Azure Machine Learning (Italian)
Davide Mauri
 

Viewers also liked (20)

Data modeling facts
Data modeling factsData modeling facts
Data modeling facts
 
SQL Server 2016 Temporal Tables
SQL Server 2016 Temporal TablesSQL Server 2016 Temporal Tables
SQL Server 2016 Temporal Tables
 
SQL Server 2016 What's New For Developers
SQL Server 2016  What's New For DevelopersSQL Server 2016  What's New For Developers
SQL Server 2016 What's New For Developers
 
Dw design fact_tables_types_6
Dw design fact_tables_types_6Dw design fact_tables_types_6
Dw design fact_tables_types_6
 
A 100 dal genocidio georges ruyssen civiltà cattolica
A 100 dal genocidio georges ruyssen   civiltà cattolicaA 100 dal genocidio georges ruyssen   civiltà cattolica
A 100 dal genocidio georges ruyssen civiltà cattolica
 
Datarace: IoT e Big Data (Italian)
Datarace: IoT e Big Data (Italian)Datarace: IoT e Big Data (Italian)
Datarace: IoT e Big Data (Italian)
 
Iris Multi-Class Classifier with Azure ML
Iris Multi-Class Classifier with Azure MLIris Multi-Class Classifier with Azure ML
Iris Multi-Class Classifier with Azure ML
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data flow ii extract
Data flow   ii extractData flow   ii extract
Data flow ii extract
 
Real Time Power BI
Real Time Power BIReal Time Power BI
Real Time Power BI
 
AzureML - Creating and Using Machine Learning Solutions (Italian)
AzureML - Creating and Using Machine Learning Solutions (Italian)AzureML - Creating and Using Machine Learning Solutions (Italian)
AzureML - Creating and Using Machine Learning Solutions (Italian)
 
SSIS Monitoring Deep Dive
SSIS Monitoring Deep DiveSSIS Monitoring Deep Dive
SSIS Monitoring Deep Dive
 
Data modeling dimensions
Data modeling dimensionsData modeling dimensions
Data modeling dimensions
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Dashboarding with Microsoft: Datazen & Power BI
Dashboarding with Microsoft: Datazen & Power BIDashboarding with Microsoft: Datazen & Power BI
Dashboarding with Microsoft: Datazen & Power BI
 
Azure Machine Learning (Italian)
Azure Machine Learning (Italian)Azure Machine Learning (Italian)
Azure Machine Learning (Italian)
 

Similar to Temporal Snapshot Fact Tables

SR&ED for Agile Startups
SR&ED for Agile StartupsSR&ED for Agile Startups
SR&ED for Agile Startups
Ed Levinson
 
data structure
data structuredata structure
data structure
Sakshi Jain
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and future
Nir Rubinstein
 
Biug 20112026 dimensional modeling and mdx best practices
Biug 20112026   dimensional modeling and mdx best practicesBiug 20112026   dimensional modeling and mdx best practices
Biug 20112026 dimensional modeling and mdx best practices
Itay Braun
 
Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubes
mister_zed
 
Project Scheduling
Project SchedulingProject Scheduling
Project Scheduling
Baker Khader Abdallah, PMP
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
vtunotesbysree
 
Modeling, It's Not Just For Calendars and Energy
Modeling, It's Not Just For Calendars and EnergyModeling, It's Not Just For Calendars and Energy
Modeling, It's Not Just For Calendars and Energy
Illinois ASHRAE
 
Software Project Management (lecture 4)
Software Project Management (lecture 4)Software Project Management (lecture 4)
Software Project Management (lecture 4)
Syed Muhammad Hammad
 
Software Project Management lecture 9
Software Project Management lecture 9Software Project Management lecture 9
Software Project Management lecture 9
Syed Muhammad Hammad
 
Oracle R12 Upgrade Lessons Learned
Oracle R12 Upgrade Lessons LearnedOracle R12 Upgrade Lessons Learned
Oracle R12 Upgrade Lessons Learned
bpellot
 
Soft Landings Conference Case Study 2
Soft Landings Conference Case Study 2Soft Landings Conference Case Study 2
Soft Landings Conference Case Study 2
BSRIA
 
Advanced Lean Training Manual Toolkit.ppt
Advanced Lean Training Manual Toolkit.pptAdvanced Lean Training Manual Toolkit.ppt
Advanced Lean Training Manual Toolkit.ppt
ThinL389917
 
Metrics-Based Process Mapping
Metrics-Based Process MappingMetrics-Based Process Mapping
Metrics-Based Process Mapping
TKMG, Inc.
 
Oracle Primavera P6 PRO R8 Tips & tricks
Oracle Primavera P6 PRO R8 Tips & tricksOracle Primavera P6 PRO R8 Tips & tricks
Oracle Primavera P6 PRO R8 Tips & tricks
CADD Centre Software Solutions Private Limited
 
Look into communication
Look into communicationLook into communication
Look into communication
An Le
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2
Omar Ahmed
 
From Waterfall to Weekly Releases: A Case Study in using Evo and Kanban (2004...
From Waterfall to Weekly Releases: A Case Study in using Evo and Kanban (2004...From Waterfall to Weekly Releases: A Case Study in using Evo and Kanban (2004...
From Waterfall to Weekly Releases: A Case Study in using Evo and Kanban (2004...
Tathagat Varma
 
Week3 to-design
Week3 to-designWeek3 to-design
Week3 to-design
Alexandru-Gala Popescu
 
Performant Django - Ara Anjargolian
Performant Django - Ara AnjargolianPerformant Django - Ara Anjargolian
Performant Django - Ara Anjargolian
Hakka Labs
 

Similar to Temporal Snapshot Fact Tables (20)

SR&ED for Agile Startups
SR&ED for Agile StartupsSR&ED for Agile Startups
SR&ED for Agile Startups
 
data structure
data structuredata structure
data structure
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and future
 
Biug 20112026 dimensional modeling and mdx best practices
Biug 20112026   dimensional modeling and mdx best practicesBiug 20112026   dimensional modeling and mdx best practices
Biug 20112026 dimensional modeling and mdx best practices
 
Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubes
 
Project Scheduling
Project SchedulingProject Scheduling
Project Scheduling
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
 
Modeling, It's Not Just For Calendars and Energy
Modeling, It's Not Just For Calendars and EnergyModeling, It's Not Just For Calendars and Energy
Modeling, It's Not Just For Calendars and Energy
 
Software Project Management (lecture 4)
Software Project Management (lecture 4)Software Project Management (lecture 4)
Software Project Management (lecture 4)
 
Software Project Management lecture 9
Software Project Management lecture 9Software Project Management lecture 9
Software Project Management lecture 9
 
Oracle R12 Upgrade Lessons Learned
Oracle R12 Upgrade Lessons LearnedOracle R12 Upgrade Lessons Learned
Oracle R12 Upgrade Lessons Learned
 
Soft Landings Conference Case Study 2
Soft Landings Conference Case Study 2Soft Landings Conference Case Study 2
Soft Landings Conference Case Study 2
 
Advanced Lean Training Manual Toolkit.ppt
Advanced Lean Training Manual Toolkit.pptAdvanced Lean Training Manual Toolkit.ppt
Advanced Lean Training Manual Toolkit.ppt
 
Metrics-Based Process Mapping
Metrics-Based Process MappingMetrics-Based Process Mapping
Metrics-Based Process Mapping
 
Oracle Primavera P6 PRO R8 Tips & tricks
Oracle Primavera P6 PRO R8 Tips & tricksOracle Primavera P6 PRO R8 Tips & tricks
Oracle Primavera P6 PRO R8 Tips & tricks
 
Look into communication
Look into communicationLook into communication
Look into communication
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2
 
From Waterfall to Weekly Releases: A Case Study in using Evo and Kanban (2004...
From Waterfall to Weekly Releases: A Case Study in using Evo and Kanban (2004...From Waterfall to Weekly Releases: A Case Study in using Evo and Kanban (2004...
From Waterfall to Weekly Releases: A Case Study in using Evo and Kanban (2004...
 
Week3 to-design
Week3 to-designWeek3 to-design
Week3 to-design
 
Performant Django - Ara Anjargolian
Performant Django - Ara AnjargolianPerformant Django - Ara Anjargolian
Performant Django - Ara Anjargolian
 

More from Davide Mauri

Azure serverless Full-Stack kickstart
Azure serverless Full-Stack kickstartAzure serverless Full-Stack kickstart
Azure serverless Full-Stack kickstart
Davide Mauri
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
Davide Mauri
 
Dapper: the microORM that will change your life
Dapper: the microORM that will change your lifeDapper: the microORM that will change your life
Dapper: the microORM that will change your life
Davide Mauri
 
When indexes are not enough
When indexes are not enoughWhen indexes are not enough
When indexes are not enough
Davide Mauri
 
Building a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureBuilding a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with Azure
Davide Mauri
 
SSIS Monitoring Deep Dive
SSIS Monitoring Deep DiveSSIS Monitoring Deep Dive
SSIS Monitoring Deep Dive
Davide Mauri
 
Azure SQL & SQL Server 2016 JSON
Azure SQL & SQL Server 2016 JSONAzure SQL & SQL Server 2016 JSON
Azure SQL & SQL Server 2016 JSON
Davide Mauri
 
SQL Server & SQL Azure Temporal Tables - V2
SQL Server & SQL Azure Temporal Tables - V2SQL Server & SQL Azure Temporal Tables - V2
SQL Server & SQL Azure Temporal Tables - V2
Davide Mauri
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
Davide Mauri
 
Azure ML: from basic to integration with custom applications
Azure ML: from basic to integration with custom applicationsAzure ML: from basic to integration with custom applications
Azure ML: from basic to integration with custom applications
Davide Mauri
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
Davide Mauri
 
SQL Server 2016 JSON
SQL Server 2016 JSONSQL Server 2016 JSON
SQL Server 2016 JSON
Davide Mauri
 
Back to the roots - SQL Server Indexing
Back to the roots - SQL Server IndexingBack to the roots - SQL Server Indexing
Back to the roots - SQL Server Indexing
Davide Mauri
 
Schema less table & dynamic schema
Schema less table & dynamic schemaSchema less table & dynamic schema
Schema less table & dynamic schema
Davide Mauri
 
BIML: BI to the next level
BIML: BI to the next levelBIML: BI to the next level
BIML: BI to the next level
Davide Mauri
 
Data juice
Data juiceData juice
Data juice
Davide Mauri
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
Davide Mauri
 
Delayed durability
Delayed durabilityDelayed durability
Delayed durability
Davide Mauri
 
Hekaton: In-memory tables
Hekaton: In-memory tablesHekaton: In-memory tables
Hekaton: In-memory tables
Davide Mauri
 

More from Davide Mauri (19)

Azure serverless Full-Stack kickstart
Azure serverless Full-Stack kickstartAzure serverless Full-Stack kickstart
Azure serverless Full-Stack kickstart
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
 
Dapper: the microORM that will change your life
Dapper: the microORM that will change your lifeDapper: the microORM that will change your life
Dapper: the microORM that will change your life
 
When indexes are not enough
When indexes are not enoughWhen indexes are not enough
When indexes are not enough
 
Building a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureBuilding a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with Azure
 
SSIS Monitoring Deep Dive
SSIS Monitoring Deep DiveSSIS Monitoring Deep Dive
SSIS Monitoring Deep Dive
 
Azure SQL & SQL Server 2016 JSON
Azure SQL & SQL Server 2016 JSONAzure SQL & SQL Server 2016 JSON
Azure SQL & SQL Server 2016 JSON
 
SQL Server & SQL Azure Temporal Tables - V2
SQL Server & SQL Azure Temporal Tables - V2SQL Server & SQL Azure Temporal Tables - V2
SQL Server & SQL Azure Temporal Tables - V2
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
Azure ML: from basic to integration with custom applications
Azure ML: from basic to integration with custom applicationsAzure ML: from basic to integration with custom applications
Azure ML: from basic to integration with custom applications
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
 
SQL Server 2016 JSON
SQL Server 2016 JSONSQL Server 2016 JSON
SQL Server 2016 JSON
 
Back to the roots - SQL Server Indexing
Back to the roots - SQL Server IndexingBack to the roots - SQL Server Indexing
Back to the roots - SQL Server Indexing
 
Schema less table & dynamic schema
Schema less table & dynamic schemaSchema less table & dynamic schema
Schema less table & dynamic schema
 
BIML: BI to the next level
BIML: BI to the next levelBIML: BI to the next level
BIML: BI to the next level
 
Data juice
Data juiceData juice
Data juice
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
Delayed durability
Delayed durabilityDelayed durability
Delayed durability
 
Hekaton: In-memory tables
Hekaton: In-memory tablesHekaton: In-memory tables
Hekaton: In-memory tables
 

Recently uploaded

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

Temporal Snapshot Fact Tables

  • 1. SQL Bits X Temporal Snapshot Fact Table A new approach to data snapshots Davide Mauri dmauri@solidq.com Mentor - SolidQ
  • 2. EXEC sp_help „Davide Mauri‟ • Microsoft SQL Server MVP • Works with SQL Server from 6.5, on BI from 2003 • Specialized in Data Solution Architecture, Database Design, Performance Tuning, BI • President of UGISS (Italian SQL Server UG) • Mentor @ SolidQ • Twitter: mauridb • Blog: http://sqlblog.com/blogs/davide_mauri
  • 3. Agenda • The problem • The possible «classic» solutions • Limits of «classic» solutions • A temporal approach • Implementing the solution • Technical & Functional Challenges (and their resolution ) • Conclusions
  • 5. The request • Our customer (an insurance company) needed a BI solution that would allow them to do some “simple” things: – Analyze the value/situation of all insurances at a specific point in time – Analyze all the data from 1970 onwards – Analyze data on daily basis • Absolutely no weekly or monthly aggregation
  • 6. The environment • On average they have 3.000.000 documents related to insurances that need to be stored. – Each day. • Data is mainly stored in a DB2 mainframe – Data is (sometimes) stored in a “temporal” fashion • For each document a new row is created each time something changes • Each row has “valid_from” and “valid_to” columns
  • 7. Let‟s do the math • To keep the daily snapshot – of 3.000.000 documents – for 365 days – for 40 years • A fact table of near 44 BILLION rows would be needed
  • 8. The possible solutions • How can the problem be solved? – PDW and/or Brute Force was not an option • Unfortunately none of the three known approaches can work here – Transactional Fact Table – Periodic Snapshot Fact Table – Accumulating Snapshot Fact Table • A new approach is needed!
  • 9. The classic solutions And why they cannot be used here
  • 10. Transaction Fact Table • One row per fact occurring at a certain point in time • We don‟t have facts for every day. For example, for the document 123, no changes were made on 20 August 2011 – Analysis on that date would not show the document 123 – “LastNonEmpty” aggregation cannot help since dates don‟t have children – MDX can come to the rescue here but… • Very high complexity • Potential performance problems
  • 11. Accumulating Snapshot F.T. • One fact row per entity in the fact table – Each row has a lot of date columns that store all the changes that happen during the entity‟s lifetime – Each column date represents a point in time where something changed • The number of changes must be known in advance • With so many date columns it can be difficult to manage the “Analysis Date” 11
  • 12. Periodic Snapshot Fact Table • The fact table holds a «snapshot» of all data for a specific point in time – That‟s the perfect solution, since doing a snapshot each day would completely solve the problem – Unfortunately there is just too much data – We need to keep the idea but reduce the amount of data BIA-406-S | Temporal Snapshot Fact Table 12
  • 13. A temporal approach And how it can be applied to BI
  • 14. Temporal Data • The snapshot fact table is a good starting point – Keeping a snapshot of a document for each day is just a waste of space – And also a big performance killer • Changes to document don‟t happen too often – Fact tables should be temporal in order to avoid data duplication – Ideally, each fact row needs a “valid_from” and “valid_to” column
  • 15. Temporal Data • With Temporal Snapshot Fact Table (TSFT) each row represents a fact that occurred during a time interval not at a point in time. • Time Intervals will be right-opened in order to simplify calculations: [valid_from, valid_to) Which means: valid_from <= x < valid_to
  • 16. Temporal Data • Now that the idea is set, we have to solve several problems: – Functional Challenges • SSAS doesn‟t support the concept of intervals, nor the between filter • The user just wants to define the Analysis Date, and doesn‟t want to deal with ranges – Technical Challenges • Source data may come from several tables and has to be consolidated in a small number of TSFT • The concept of interval changes the rules of the “join” game a little bit…
  • 17. Technical Challenges • Source data may come from table with different designs: temporal tables non-temporal tables • Temporal Tables – Ranges from two different temporal tables may overlap • Non-temporal tables – There may be some business rules that require us to «break» the existing ranges
  • 18. Technical Challenges • Before starting to solve the problem, let‟s generalize it a bit (in order to be able to study it theoretically) • Imagine a company with this environment – There are Items & Sub Items – Items have 1 or more Sub Items • Each Sub Item has a value in money • Value may change during its lifetime – Customers have to pay an amount equal to the sum of all Sub Items of an Item • Customers may pay when they want during the year, as long as the total amount is paid by the end of the year
  • 20. Technical Challenges • When joining data, keys are no longer sufficient • Time Intervals must also be managed correctly, otherwise incorrect results may be created • Luckily time operators are well-known (but not yet implemented) – overlaps, contains, meets, ecc. ecc.
  • 21. Technical Challenges • CONTAINS (or DURING) Operator: b1 e1 i1 b2 e2 i2 • Means: b1<=b2 AND e1>=e2
  • 22. Technical Challenges • OVERLAPS Operator: b1 e1 i1 b2 e2 i2 • Means: b1<=e2 AND b2 <=e1 • Since we‟re using right-opened intervals, a little change is needed here: b1<e2 AND b2 <e1
  • 23. Technical Challenges • Before going any further it‟s best we start to «see» the data we‟re going to use 2008 2009 2010 Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Item 1 A B C Item 1.1 A B C D B C Item 1.2 A B C D E C Item 2 A B C D C Item 2.1 A B C D C
  • 25. Technical Challenges • As you have seen, even with just three tables, the joins are utterly complex! • But bearing this in mind, we can now find an easier, logically equivalent, solution. • Drawing a timeline and visualizing the result helps a lot here.
  • 26. Technical Challenges Range for one Item Range for the Sub Item Payment Dates Intervals will represent a So, on the other way round, time where nothing dates represent a time in changed which something changed! «Summarized» Status 1 2 3 4 5 6
  • 27. Technical Challenges • In simpler words we need to find the minimum set of time intervals for each entity – The concept of "granularity" must be changed in order to take into account the idea of time intervals – All fact tables, for each stored entity, must have the same granularity for the time interval • interval granularity must be the same “horizontally” (across tables) • interval granularity must be specific for each entity (across rows: “vertically”)
  • 28. Technical Challenges • The problem can be solved by «refactoring» data. • Just like refactoring code, but working on data – Rephrased idea: “disciplined technique for restructuring existing temporal data, altering its internal structure without changing its value” • http://en.wikipedia.org/wiki/Code_refactoring
  • 29. Technical Challenges • Here‟s how: • Step 1 – For each item, for each temporal table, union all the distinct ranges
  • 30. Technical Challenges • Step 2 – For each item, take all the distinct: – Dates from the previous step – Dates that are known to «break» a range – The first day of each year used in ranges
  • 31. Technical Challenges • Step 3 – For each item, create the new ranges, doing the following steps • Order the rows by date • Take the «k» row and the «k+1» row • Create the range [Datek, Datek+1) The New LEAD operator is exactly what we need! 
  • 32. Technical Challenges • When the time granularity is the day, the right-opened range helps a lot here. – Eg: let‟s say the at some point, for an entity, you have the following dates that has to be turned into ranges:
  • 33. Technical Challenges • With a closed range intervals can be ambiguous, especially single day intervals: – The meaning is not explicitly clear and we would have to take care of it, making the ETL phase more complex. – It‟s better to make everything explicit
  • 34. Technical Challenges • So, to avoid ambiguity, with a closed range the resulting set of intervals also needs to take time into account: – What granularity to use for time? • Hour? Minute? Microseconds? – Is just a waste of space – Date or Int datatype cannot be used
  • 35. Technical Challenges • With a right-opened interval everything is easier: – Date or Int datatype are enough – No time used, so no need to deal with it
  • 36. Technical Challenges • Step 4 – «Explode» all the temporal source tables so that all data will use the same (new) ranges, using the CONTAINS operator:
  • 37. Technical Challenges • The original Range: Becomes the equivalent set of ranges: If we would have chosen a daily snapshot, 425 rows would have been generated…not only 12! (That‟s a 35:1 ratio!)
  • 38. Technical Challenges • Now we have all the source data conforming to a common set of intervals – This means that we can just join tables without having to do complex temporal joins • All fact tables will also have the same interval granularity • We solved all the technical problems! 
  • 40. Functional Challenges • SSAS doesn‟t support time intervals but… • Using some imagination we can say that – An interval is made of 1 to “n” dates – A date belongs to 1 to “n” intervals • We can model this situation with a Many- To-Many relationship!
  • 41. Functional Challenges • The «interval dimension» will then be hidden and the user will just see the «Analysis Date» dimension • When such date is selected, all the fact rows that have an interval containing that date will be selected too, giving us the solution we‟re looking for 
  • 42. Functional Challenges Date The user wants Dimension to analyze data on 10 Aug 2009 Fact Table Factless DateRange Date-DateRange Dimension SSAS uses all 10 Aug is the found ranges contained in two All the (three) Ranges rows related to those ranges are read
  • 44. The final recap And a quick recipe to make everything easier
  • 45. Steps for the DWH • For each entity: – Get all “valid_from” and “valid_to” columns from all tables – Get all the dates that will break intervals from all tables – Take all the gathered dates one time (remove duplicates) and generate new intervals – «Unpack» original tables in order to use the newly generated intervals
  • 46. Steps for the CUBE • Generate the usual Date dimension – Generate a «Date Range» dimension – Generate a Factless Date-DataRange table – Generate the fact table with reference to the Date Range dimension – Create the M:N relationship
  • 47. BIA-406-S | Temporal Snapshot Fact Table 47
  • 48. Conclusion and Improvements Some ideas for the future
  • 49. Conclusion and improvements • The defined pattern can be applied each time a daily analysis is needed and data is not additive – So each time you need to “snapshot” something • The approach can also be used if source data is not temporal, as long as you can turn it into a temporal format • Performance can be improved by partitioning the cube by year or even by month
  • 50. Conclusion and improvements • At the end is a very simple solution  • Everything you‟ve seen is the summary of several months of work of the Italian Team. Thanks guys!

Editor's Notes

  1. FromBol: “The member value is evaluated as the value of its last child along the time dimension that contains data”
  2. http://www.kimballgroup.com/html/designtipsPDF/DesignTips2002/KimballDT37ModelingPipeline.pdf
  3. Show SQL Queries
  4. Show the SSIS solution to explode the tables
  5. Show why the M2Msolutionworks (using T-SQL queries)Show the olapviewsShow the SSAS solution
  6. Review the complete solution from start to end (maybeusing *much* more data?)