Documenting
Data Lineage
with Dataedo
© 2023, Dataedo sp. z o.o.
Chase Summers
Director, Customer Operations
Piotr Kononow
CEO
Agenda
1. What is Data Lineage and Impact Analysis?
2. What’s the challenge?
3. Data Lineage in Dataedo – current solution and plans
4. What you need
5. Announcements
6. Q&A
What is Data Lineage
What is Data Lineage
• Process of understanding, documenting and visualizing of how data
flows throught the ecosystem
Data Lineage vs Impact analysis
Data Lineage of an object
Impact Analysis in an object
Object level vs column level lineage
Manual vs automatic
• Fully automatic
• Some elements automatic
• Suggestions, human verification
• Fully manual
Why understanding a lineage is a challange?
SQL Views Bactch scripts
SQL Stored procedures
Databasetriggers
External tables
BI tools
Different SQL dialects
Different ETL tools
Pipelines
COPY INTO
CREATE TABLE AS
API calls
Webhooks
Manualentry
Application forms
Examples of Data Flows
Data flows: Stored Procedures
• Stored procedures
• Functions
• Triggers
Destination
Source
My Procedure
crm_mails
crm_customers
Data flows: Views
Data flows: ETL
Data flows: BI tools
Data flows: API and integration
Data Lineage in Dataedo
Under construction
Lineage for views from SQL parsing
Lineage for views from dependecies
• Object level only
• Connectors that support dependencies
SQL parsing
BI tools
Power BI
Power BI
Currently
• Datasets, columns
• Reports
• Object level lineage
• Some source<->dataset lineage
Soon
• Object level lineage
• On prem
ETL tools
Defining object level lineage manually
Source Object
Processor (object)
Step 1
dbo.incoming_emails dbo.crm_customers
Destination Object
Processes (steps)
Inflow Outflow
dbo.load_customer_data
Source columns Destination columns
Mapping
Step 2
src_column_1
src_ column_2
dest_column_1
dest_column_2
Manual lineage
• Create „processor” or use existing object
• Show inflow and outflow objects
Defining column level lineage
• For anly lineage
• Inflow first/Outflow first
• Intelligent matching
Plans
More metadata for processes (steps) [Q2]
• Script
• Description
More metadata for column level linage [Q2]
• Desciption
• Transformation
• Custom fields
Automatic lineage for saved queries [Q2]
Import from external source/Excel [Q2]
• Import through interface tables
• Paste to UI (Later)
Change: direct table to table lineage [Q2]
• Anything can be a processor
More object types [Q2]
To help document data ecosystem:
• ETL package
• Application form, list
• API
• …
Lineage Assistant [2023+]
• Shows tables/datasets without lineage
• Shows lineage without column lineage
Tableau [Q2+]
Q2
• Reports
• Databases and Data Sources
• Lineage between
Later
• Automatic lineage for Custom SQL Query
Power BI [Q2+]
• Column level lineage [?]
• Power BI On prem [Q2-3]
SSIS [Q3+]
Q3
• Other than SQL Server sources
• Column level lineage
• Import SQL sources to catalog
• Parse SQL for sources
Later
• Support for Execute SQL Task – SQL parsing
Snowflake [Q2]
• Snowpipes
• Query Log – COPY INTO
• External tables
Data Lineage:
What you need
Licensing
New customers
Contact: Jonathan@dataedo.com
Current customers
• Plans Dataedo, Data Datalog: Upgrade to 10.4
• Legacy plans: contact Jonathan@dataedo.com
Slack channel – coming soon!
• Product/feature ideas, feedback and discussions
• Best practices
• Community
Who’s interested?
Hiring: Data Success Manager
• Help our customers have success with data and documentation
• Soon on: https://dataedo.com/careers
Q&A
Contact us:
dataedo@dataedo.com
Support: support@dataedo.com
Chase Summers
Director, Customer Operations
Piotr Kononow
CEO

Documenting ​ Data Lineage ​ with Dataedo​