SlideShare a Scribd company logo
Treasure Data Hands-On: Managing Slowly
Changing Dimensions Using TD Workflow
Agenda
● Introduction
● Treasure Data Workflow
● Overview of Slowly Changing Dimensions
● Window Functions
● Handling Type 2 SCDs using Treasure Data
Introduction
• Scott Mitchell
• Senior Solution Engineer
• Work with Enterprise clients to
maximize the activation of the
client data
• smitchell@treasure-data.com
Introduction
Treasure Data is a Customer Data Platform
“Customer Data Platform (CDP) is a marketer-based management system
that creates a persistent, unified customer database that is accessible to
other systems. Data is pulled from multiple sources, cleaned, and combined
to create a single customer view. This structured data is then made available
to other marketing systems. CDP provides real-time segmentation for
sophisticated personalized marketing.”
https://en.wikipedia.org/wiki/Customer_Data_Platform
Our Customer Data Platform: Foundation
Data Management
1st party data
(Your data)
● Web
● Mobile
● Apps
● CRMs
● Offline
2nd & 3rd party DMPs
(enrichment)
Tool Integration
● Campaigns
● Advertising
● Social media
● Reporting
● BI & data
science
ID Unification
Persistent Storage
Workflow Orchestration
ActivationAll Your Data
Segmentation
Profiles Segments
Measurement
Treasure Data Workflow
DATA ORCHESTRATION AND WORKFLOW MANAGEMENT
•Workflow management across data input, processing and output
•Supports both scheduled & trigger-based execution
•Cloud-based and Client-hosted. Client-hosted version can run custom code.
•Cloud-based version has both web UI & REST API
The core engine is built on our open source project
Digdag
Treasure Workflow allow users to build repeatable data processing pipelines that consist of
Treasure Data jobs.
Overview
Why use Treasure Workflow?
1. Enhanced Organization
• Organize your processing workflows into groups of similarly-purposed tasks
2. Reduce Errors
• No longer must manage dependencies by scheduled-time alone
3. Ease Error Handling
• Split large scripts & queries into smaller, more manageable, jobs
4. Improve Collaboration
• Organize your job flows into projects
Benefits
WORKFLOW DEFINITION: CLOSER LOOK
timezone: Asia/Tokyo
schedule:
daily>: 07:00:00
_export:
td:
database: nishi
+load:
td_load>: import/s3_load.yml
database: nishi
table: monthly_goods_sales
+daily:
td>: queries/daily_open.sql
create_table: daily_open
+monthly:
td>: queries/monthly_open.sql
result_connection: nishi_s3
result_settings:
bucket: nishitetsu-test
path: /monthly_open.csv
•File extension should be “.dig” ‘to be
recognized as workflow
•Standard YAML
•Task names are prefixed by “+”
•Operators are postfixed by “>”
•Schedules can be set with schedule
•Variables are supported via ${variable_name}
REPRESENTATIVE OPERATORS
Category Name Description
Control Flow
call>: Call another workflow
loop>: Repeat tasks a specified # of times
for_each>: Loop through a specified list
if>: if/else control flow
Treasure Data
td>: Run a specified TD query
td_run>: Run a saved query
td_ddl>: Create, delete, rename, truncate tables
td_load>: Invoke an input data transfer
td_for_each>: Loop through a query result row by row
AWS
s3_wait>: Wait for new files in S3 & download
redshift>: Run Redshift query
redshift_load>: Load data into Redshift
redshift_unload>: Unload data from Redshift
Google Cloud Platform
bq>: Run BigQuery query
bq_extract>: Unload data from BigQuery to GCS
Slowly Changing
Dimensions
Slowly Changing Dimensions
• Particular dimensions within a dataset that are prone to change
unpredictably
• Example: the phone number or email field of a CRM dataset
• Data available from a CRM usually represents the current, up-to-date value
of each field for each customer
• Storing a history this customer data requires managing these slowly
changing dimensions (SCDs)
Different Ways to Handle SCDs
• Type 1
• Type 2
• Type 3
• Type 4
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 1:
company_id company_name company_state
123 Sterling Cooper California
Type 2: Keep both records, flag the “current” row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 2:
company_id company_name company_state is_current
123 Sterling Cooper New York 0
123 Sterling Cooper California 1
Type 3: Store the latest two values in one row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 3:
company_id company_name company_state_current company_state_previous
123 Sterling Cooper California New York
Type 4: Use a separate history table
SCD Type 4:
company_id company_name company_state
123 Sterling Cooper California
company
company_id company_name company_state last_modified_date
123 Sterling Cooper New York 2007-06-19
123 Sterling Cooper California 2008-10-12
company_history
Window Functions
Type 2: Keep both records, flag the “current” row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 2:
company_id company_name company_state is_current
123 Sterling Cooper New York 0
123 Sterling Cooper California 1
Type 2: Keep both records, flag the “current” row
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
Old Record:
New Record:
SCD Type 2:
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
Type 2: Keep both records, flag the “current” row
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
Old Record:
New Record:
SCD Type 2:
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
Window Functions
• Window functions perform calculations across rows of the query result
• They run after the ‘HAVING’ clause but before the ‘ORDER BY’ clause
• They are written in the ‘SELECT’ clause and display results in their own
column
• They have three parts:
Window Functions
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC)
ordering specificationfunction partition specification
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
123 Sterling Cooper California 2008-10-12
company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 2
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 2
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 2
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 2
123 Sterling Cooper New York 2007-06-19 1
124 CGC Connecticut 2018-05-22 2
124 CGC New York 2010-08-22 1
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
Window Functions
SELECT
company_id,
company_name,
company_state,
CASE WHEN rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) = 1 THEN 1 ELSE 0 AS END as isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
Implementation in Treasure Data
1. Load incremental data from a data source to a staging table
1. Drop the target table that contains outdated SCD information
1. Window over the staging table, rebuilding the target table with the latest
SCD information
Implementation in Treasure Data
1. Load incremental data from a data source to a staging table
1. Drop the target table that contains outdated SCD information
1. Window over the staging table, rebuilding the target table with the latest
SCD information
Implementation in Treasure Data
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
target_company
Thank You
And
Questions
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
1. Store a temp table of the current rows that will not be current after the new data is
ingested
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
1. Store a temp table of the current rows that will not be current after the new data is
ingested
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
target_company
2. Delete from the data lake any current rows that have a matching id in the new data
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
target_company
3. Insert the temp rows into the target table
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
3. Insert the temp rows into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
3. Insert the temp rows into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
4. Insert the new data into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
target_company
4. Insert the new data into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
target_company
4. Insert the new data into the target table
Contact Information
• Scott Mitchell
• Senior Solution Engineer
• smitchell@treasure-data.com

More Related Content

What's hot

Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
Eric Matthews
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
Nilesh Gule
 
Why ODS? The Role Of The ODS In Today’s BI World And How Oracle Technology H...
Why ODS?  The Role Of The ODS In Today’s BI World And How Oracle Technology H...Why ODS?  The Role Of The ODS In Today’s BI World And How Oracle Technology H...
Why ODS? The Role Of The ODS In Today’s BI World And How Oracle Technology H...
C. Scyphers
 
Windows Azure Storage – Architecture View
Windows Azure Storage – Architecture ViewWindows Azure Storage – Architecture View
Windows Azure Storage – Architecture View
Chaowlert Chaisrichalermpol
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
Databricks
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
BizTalk360
 
warner-DP-203-slides.pptx
warner-DP-203-slides.pptxwarner-DP-203-slides.pptx
warner-DP-203-slides.pptx
HibaB2
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
Amazon Web Services LATAM
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDBScalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDB
Alluxio, Inc.
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
Databricks
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
Mark Kromer
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
SHIKHA GAUTAM
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
Amazon Web Services
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
James Serra
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
Databricks
 
Best Practices of Data Modeling with InfoSphere Data Architect
Best Practices of Data Modeling with InfoSphere Data ArchitectBest Practices of Data Modeling with InfoSphere Data Architect
Best Practices of Data Modeling with InfoSphere Data Architect
Vladimir Bacvanski, PhD
 

What's hot (20)

Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 
Why ODS? The Role Of The ODS In Today’s BI World And How Oracle Technology H...
Why ODS?  The Role Of The ODS In Today’s BI World And How Oracle Technology H...Why ODS?  The Role Of The ODS In Today’s BI World And How Oracle Technology H...
Why ODS? The Role Of The ODS In Today’s BI World And How Oracle Technology H...
 
Windows Azure Storage – Architecture View
Windows Azure Storage – Architecture ViewWindows Azure Storage – Architecture View
Windows Azure Storage – Architecture View
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
warner-DP-203-slides.pptx
warner-DP-203-slides.pptxwarner-DP-203-slides.pptx
warner-DP-203-slides.pptx
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDBScalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDB
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
Best Practices of Data Modeling with InfoSphere Data Architect
Best Practices of Data Modeling with InfoSphere Data ArchitectBest Practices of Data Modeling with InfoSphere Data Architect
Best Practices of Data Modeling with InfoSphere Data Architect
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 

Similar to Hands-On: Managing Slowly Changing Dimensions Using TD Workflow

SetFocus SQL Portfolio
SetFocus SQL PortfolioSetFocus SQL Portfolio
SetFocus SQL Portfoliogeometro17
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
Databricks
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson PortfolioKbengt521
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
Stamatis Zampetakis
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus workKevinPSF
 
Datawarehousing with MySQL
Datawarehousing with MySQLDatawarehousing with MySQL
Datawarehousing with MySQL
Harshit Parekh
 
Pierre Xavier Portfolio
Pierre Xavier PortfolioPierre Xavier Portfolio
Pierre Xavier Portfoliopbxavier
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
Dmitry Anoshin
 
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdfScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
alokindustries1
 
SQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cSQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19c
RachelBarker26
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
Open Party
 
Sql Portfolio
Sql PortfolioSql Portfolio
Sql Portfolio
Shelli Ciaschini
 
Df12 Performance Tuning
Df12 Performance TuningDf12 Performance Tuning
Df12 Performance Tuning
Stuart Bernstein
 
Elshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order ManagementElshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order Management
Ahmed Elshayeb
 
Advanced Relevancy Ranking
Advanced Relevancy RankingAdvanced Relevancy Ranking
Advanced Relevancy Ranking
Search Technologies
 
Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniqueslucenerevolution
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
Ike Ellis
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Sergii Khomenko
 
1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx
MullaMainuddin
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API Integration
Jerod Johnson
 

Similar to Hands-On: Managing Slowly Changing Dimensions Using TD Workflow (20)

SetFocus SQL Portfolio
SetFocus SQL PortfolioSetFocus SQL Portfolio
SetFocus SQL Portfolio
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus work
 
Datawarehousing with MySQL
Datawarehousing with MySQLDatawarehousing with MySQL
Datawarehousing with MySQL
 
Pierre Xavier Portfolio
Pierre Xavier PortfolioPierre Xavier Portfolio
Pierre Xavier Portfolio
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
 
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdfScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
 
SQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cSQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19c
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 
Sql Portfolio
Sql PortfolioSql Portfolio
Sql Portfolio
 
Df12 Performance Tuning
Df12 Performance TuningDf12 Performance Tuning
Df12 Performance Tuning
 
Elshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order ManagementElshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order Management
 
Advanced Relevancy Ranking
Advanced Relevancy RankingAdvanced Relevancy Ranking
Advanced Relevancy Ranking
 
Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniques
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
 
1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API Integration
 

More from Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
Treasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
Treasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
Treasure Data, Inc.
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
Treasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Treasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
Treasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
Treasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
Treasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
Treasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
Treasure Data, Inc.
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
Treasure Data, Inc.
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
Treasure Data, Inc.
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
Treasure Data, Inc.
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
Treasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
Treasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
Treasure Data, Inc.
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
Treasure Data, Inc.
 

More from Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 

Recently uploaded

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 

Recently uploaded (20)

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 

Hands-On: Managing Slowly Changing Dimensions Using TD Workflow

  • 1. Treasure Data Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
  • 2. Agenda ● Introduction ● Treasure Data Workflow ● Overview of Slowly Changing Dimensions ● Window Functions ● Handling Type 2 SCDs using Treasure Data
  • 3. Introduction • Scott Mitchell • Senior Solution Engineer • Work with Enterprise clients to maximize the activation of the client data • smitchell@treasure-data.com
  • 4. Introduction Treasure Data is a Customer Data Platform “Customer Data Platform (CDP) is a marketer-based management system that creates a persistent, unified customer database that is accessible to other systems. Data is pulled from multiple sources, cleaned, and combined to create a single customer view. This structured data is then made available to other marketing systems. CDP provides real-time segmentation for sophisticated personalized marketing.” https://en.wikipedia.org/wiki/Customer_Data_Platform
  • 5. Our Customer Data Platform: Foundation Data Management 1st party data (Your data) ● Web ● Mobile ● Apps ● CRMs ● Offline 2nd & 3rd party DMPs (enrichment) Tool Integration ● Campaigns ● Advertising ● Social media ● Reporting ● BI & data science ID Unification Persistent Storage Workflow Orchestration ActivationAll Your Data Segmentation Profiles Segments Measurement
  • 7. DATA ORCHESTRATION AND WORKFLOW MANAGEMENT •Workflow management across data input, processing and output •Supports both scheduled & trigger-based execution •Cloud-based and Client-hosted. Client-hosted version can run custom code. •Cloud-based version has both web UI & REST API The core engine is built on our open source project Digdag
  • 8. Treasure Workflow allow users to build repeatable data processing pipelines that consist of Treasure Data jobs. Overview
  • 9. Why use Treasure Workflow? 1. Enhanced Organization • Organize your processing workflows into groups of similarly-purposed tasks 2. Reduce Errors • No longer must manage dependencies by scheduled-time alone 3. Ease Error Handling • Split large scripts & queries into smaller, more manageable, jobs 4. Improve Collaboration • Organize your job flows into projects Benefits
  • 10. WORKFLOW DEFINITION: CLOSER LOOK timezone: Asia/Tokyo schedule: daily>: 07:00:00 _export: td: database: nishi +load: td_load>: import/s3_load.yml database: nishi table: monthly_goods_sales +daily: td>: queries/daily_open.sql create_table: daily_open +monthly: td>: queries/monthly_open.sql result_connection: nishi_s3 result_settings: bucket: nishitetsu-test path: /monthly_open.csv •File extension should be “.dig” ‘to be recognized as workflow •Standard YAML •Task names are prefixed by “+” •Operators are postfixed by “>” •Schedules can be set with schedule •Variables are supported via ${variable_name}
  • 11. REPRESENTATIVE OPERATORS Category Name Description Control Flow call>: Call another workflow loop>: Repeat tasks a specified # of times for_each>: Loop through a specified list if>: if/else control flow Treasure Data td>: Run a specified TD query td_run>: Run a saved query td_ddl>: Create, delete, rename, truncate tables td_load>: Invoke an input data transfer td_for_each>: Loop through a query result row by row AWS s3_wait>: Wait for new files in S3 & download redshift>: Run Redshift query redshift_load>: Load data into Redshift redshift_unload>: Unload data from Redshift Google Cloud Platform bq>: Run BigQuery query bq_extract>: Unload data from BigQuery to GCS
  • 13. Slowly Changing Dimensions • Particular dimensions within a dataset that are prone to change unpredictably • Example: the phone number or email field of a CRM dataset • Data available from a CRM usually represents the current, up-to-date value of each field for each customer • Storing a history this customer data requires managing these slowly changing dimensions (SCDs)
  • 14. Different Ways to Handle SCDs • Type 1 • Type 2 • Type 3 • Type 4
  • 15. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record:
  • 16. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California
  • 17. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 1: company_id company_name company_state 123 Sterling Cooper California
  • 18. Type 2: Keep both records, flag the “current” row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 2: company_id company_name company_state is_current 123 Sterling Cooper New York 0 123 Sterling Cooper California 1
  • 19. Type 3: Store the latest two values in one row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 3: company_id company_name company_state_current company_state_previous 123 Sterling Cooper California New York
  • 20. Type 4: Use a separate history table SCD Type 4: company_id company_name company_state 123 Sterling Cooper California company company_id company_name company_state last_modified_date 123 Sterling Cooper New York 2007-06-19 123 Sterling Cooper California 2008-10-12 company_history
  • 22. Type 2: Keep both records, flag the “current” row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 2: company_id company_name company_state is_current 123 Sterling Cooper New York 0 123 Sterling Cooper California 1
  • 23. Type 2: Keep both records, flag the “current” row company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 Old Record: New Record: SCD Type 2: company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12
  • 24. Type 2: Keep both records, flag the “current” row company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 Old Record: New Record: SCD Type 2: company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12
  • 25. Window Functions • Window functions perform calculations across rows of the query result • They run after the ‘HAVING’ clause but before the ‘ORDER BY’ clause • They are written in the ‘SELECT’ clause and display results in their own column • They have three parts:
  • 26. Window Functions rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) ordering specificationfunction partition specification
  • 27. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 123 Sterling Cooper California 2008-10-12 company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 2
  • 28. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 2 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 2
  • 29. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 2 123 Sterling Cooper New York 2007-06-19 1 124 CGC Connecticut 2018-05-22 2 124 CGC New York 2010-08-22 1
  • 30. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0
  • 31. Window Functions SELECT company_id, company_name, company_state, CASE WHEN rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) = 1 THEN 1 ELSE 0 AS END as isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0
  • 32. Implementation in Treasure Data 1. Load incremental data from a data source to a staging table 1. Drop the target table that contains outdated SCD information 1. Window over the staging table, rebuilding the target table with the latest SCD information
  • 33. Implementation in Treasure Data 1. Load incremental data from a data source to a staging table 1. Drop the target table that contains outdated SCD information 1. Window over the staging table, rebuilding the target table with the latest SCD information
  • 35. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 36. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 37. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company target_company
  • 38. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0 target_company
  • 40. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 41. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company 1. Store a temp table of the current rows that will not be current after the new data is ingested company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 42. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company 1. Store a temp table of the current rows that will not be current after the new data is ingested company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 43. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 target_company 2. Delete from the data lake any current rows that have a matching id in the new data company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 44. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 target_company 3. Insert the temp rows into the target table company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 45. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 3. Insert the temp rows into the target table
  • 46. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 3. Insert the temp rows into the target table
  • 47. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 4. Insert the new data into the target table
  • 48. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 target_company 4. Insert the new data into the target table
  • 49. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 target_company 4. Insert the new data into the target table
  • 50. Contact Information • Scott Mitchell • Senior Solution Engineer • smitchell@treasure-data.com