Implementing Change Systems in SQL Server 2016

Change Systems
Critical Component Series

Doug McClurg
Founder
dmcclurg@sosra.com
Data
systems
engineered
to last.

The goal of this series is to give you the tools you need to push analytics forward at your company.
• The nature and importance of change systems in an overall data platform
• Compare and contrast traditional and modern data warehouse architectures
• Discuss a key technology that is core to change systems in the enterprise
• Compare the SQL Server features that enable robust change data capture
Change Systems
Agenda

Database Engine
MDF LDF
Overview
The Source of Change
• A database engine manages files.
• Data structures
• Transaction logs
• Change systems accurately track
modifications inside data structures.
• The source of record for change is the
transaction log. Using this log directly is
a characteristic of passive change
systems.
• Active change systems watch the data
structure and record observable
change.

Overview
Modeling Change
AccountID CustomerID AccountBalance ModifyDate
4568456 2342 1234758.23
2017-03-11
04:11:05
4624572 9875 5768.01
2017-03-11
04:13:15
4745733 8735 478893.33
2017-03-11
04:13:01
AccountID CustomerID Type Amount EventDate
4568456 2342 Deposit 1198575.32
2017-03-08
09:09:04
4624572 9875 Deposit 4438.70
2017-03-08
09:10:01
4745733 8735 Deposit 460436.02
2017-03-07
10:13:20
4568456 2342 Deposit 528.11
2017-03-08
06:13:45
4624572 9875 Deposit 1345.23
2017-03-09
10:22:25
4745733 8735 Deposit 635.20
2017-03-08
11:13:01
4568456 2342 Withdrawal 23.21
2017-03-09
12:12:02
4624572 9875 Fee 21.34
2017-03-09
06:13:45
4745733 8735 Withdrawal 42.66
2017-03-10
13:13:12
4568456 2342 Transfer 35678.01
2017-03-11
04:11:05
4624572 9875 Deposit 5.42
2017-03-11
04:13:15
4745733 8735 Deposit 17864.77
2017-03-11
04:13:01
Table
Log
=*
*Record the CRUD operations to the table and you get a changelog.
The duality is that a table supports data at
rest and logs capture change. If you have a
log you can not only create the original table
but a myriad of other derived tables. Logs
therefore seem to be a more fundamental
data structure.

Overview
Modeling Change
Valid Time
John Doe who lived in Flat Rock,
NC made his first visit to us on
April 1st, 1985 and changed his
permanent address during a sale
on November 12th 2005.
Name Address ValidFrom ValidTo
John Doe 81 Carl Sandberg Ln, Flat Rock, NC 28731
1985-04-01
10:00:00
2005-11-12
09:05:00
John Doe 9433 Collingdale Way, Raleigh, NC 27617
2005-11-12
09:06:01
9999-12-31
23:59:99
Transaction Time
Our data warehouse went live on
November 1st 2005. The ETL runs
daily at 4 AM.
Name Address CreateDate ExpireDate
John Doe 81 Carl Sandberg Ln, Flat Rock, NC 28731
2005-11-01
09:25:11
2005-11-13
04:54:11
John Doe 9433 Collingdale Way, Raleigh, NC 27617
2005-11-13
04:54:12
9999-12-31
23:59:99

Overview
Modeling Change
ID Name Address ModifyDate
12345 John Doe 81 Carl Sandberg Ln, Flat Rock, NC 28731
1985-04-01
10:00:00
12345 John Doe 9433 Collingdale Way, Raleigh, NC 27617
2005-11-12
09:06:01
Key ID Name Address ValidFrom ValidTo CreateDate ExpireDate
1 12345 John Doe 81 Carl Sandberg Ln, Flat Rock, NC 28731
1985-04-01
10:00:00
2005-11-12
09:06:00
2005-11-01
09:25:11
2005-11-13
04:54:11
2 12345 John Doe 9433 Collingdale Way, Raleigh, NC 27617
2005-11-12
09:06:01
9999-12-31
23:59:99
2005-11-13
04:54:12
9999-12-31
23:59:99
ETL
Source
Target
SCD 2
Dimension
This column
creates risk
Latency of 1 Day at best

Application
Database
SQL
DB2
SQL
…
Enterprise Data
Warehouse
Mart Mart
Batch ETL
Jobs
Storage and Query
Traditional Architecture
The Pull Method

Application
Database
SQL
DB2
SQL
…
Enterprise Data
Warehouse
Mart Mart
Batch ETL
Jobs
Storage and Query
Traditional Architecture
Focus on the Source
Focus Area One
Friction & Frustration
Data Quality
• Timeliness
• Latency of change
• Latency of build
• Consistency
• Redundant ETL
• Accuracy
• Filters
• Logic
• Source
Lead Time
• Custom ETL
• Manual ETL
• Business case and ceremony
• Domain knowledge
Dependencies
• Business logic
• Redundancy
• Downstream effects
• Team

Collect
and
Route
Events Query | Model | Automate
Modern Architecture
The Push Method (Lambda)
Speed Layer
Batch Layer
Serving
Layer
Real-time
Views
Batch
Views

Events
Query | Model | Automate
Stream
Modern Architecture
The Push Method (Kappa)
Unified Log Storage
Archive
Collect

Modern Architecture
The Fungibility of Data
LOG

• Ingest (don’t extract) disparate silos of data
• Store data in its atomic form (no transform)
• Collect changes as if they were events (immutable)
• Run downstream ETL more often (process less data each cycle)
Modern Architecture
Lessons Learned

Mart
ETL
Application
Database
Enriched
Source
Mirror Layer
Mart
Storage and Query
Micro-Batch
ETL Jobs
Modern Architecture
Phase 1
Homogenize, Protect,
and Standardize
= database transaction log

Mirror Layer
Analytical Model
Temporary Staging
Source
Why Have a Mirror Layer?
1. Improve the data structure of a
source system (add primary keys,
indexes)
2. Hide complexity related to the
type of source system (SQL, API,
Mainframe)
3. Improve the quality and
performance of change tracking
4. Enable data governance programs
by homogenizing sources
5. Enable prototyping of new
automation solutions without
developer support
Risks/Assumptions
This layer must be real-time and
simple, close to the metal. The more
it looks like another ETL layer, the
more the risks will outweigh the
benefits.
Transform
Near Real-time
Intensive
Transform
Mirror layer
Overview

But all I read is hate for replication on the internets!

Mirror layer
Replication in Production
Sale
Transaction
Customer
Profile
Source Database Server
T-LOGT-LOG
Pub
Sub
ArticleArticle
Push
Dist
cmd
• Set up everything in a lower environment
and replay production activity to get an
idea of load.
• The source database is placed into an
Always On availability group so that the
database and replication can failover.
• Distributor and subscriber are moved to
their own failover cluster.
• Subscribers connect to an availability
group listener so they can find the right
server after a failover.
• Database and log backups are still taken
regularly to support disaster recovery,
but additional preparations are made to
enable a smooth restore of replication.

Mirror Layer Demo
Features of SQL Server
AccountID CustomerID AccountBalance ModifyDate
4568456 2342 1234758.23
2017-03-11
04:11:05
4624572 9875 5768.01
2017-03-11
04:13:15
AccountID Operation Columns
4568456 INSERT
4624572 UPDATE AccountBalance
4745733 DELETE
Base Table
Change Table (Internal)
AccountID CustomerID AccountBalance ModifyDate CreateDate ExpireDate
4624572 9875 5001.01
2017-03-10
06:19:01
2017-03-10
06:20:35
2017-03-11
04:14:22
4745733 8735 478893.33
2017-03-11
04:13:01
2017-03-11
04:14:59
2017-03-12
09:01:12
History Table
Change Tracking
• Net changes only
• No data
• Internal tables
• Internal functions
• Retention period only
Temporal Tables
• Net changes not automatic
• Data
• Normal tables
• T-SQL language integration
• Full support for archiving

https://github.com/dpmcclurg/ChangeSystemsDemo
Download the Code!

Implementing Change Systems in SQL Server 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (16)

Similar to Implementing Change Systems in SQL Server 2016

Similar to Implementing Change Systems in SQL Server 2016 (20)

Recently uploaded

Recently uploaded (20)

Implementing Change Systems in SQL Server 2016

Editor's Notes