SlideShare a Scribd company logo
TECHNICAL
DOCUMENTATION 101
Shristi Shrestha
FOR DATA ENGINEERS
Metadata
Metadata
Metadata is basically information that describes other data. It helps us
understand the origin, structure, nature, and context of data.
Metadata: Text File
author,
file size,
date created
date modified
Metadata: Audio File
singer
album
track duration
bit rate
Metadata: Image File
resolution
dimensions
focal length
color profile
Metadata We Care About
(Examples)
When the table or database was last documented
Who it was documented by
Business case
Tools and resources
Queries run against that table
Table, column descriptions
Data lineage - table level, column level
Data quality
Metrics
Metadata Management Software
Metadata management is a set of activities, technologies, and policies
that target metadata collection, storage, and organizing. Its goal is
making data assets understandable and discoverable for users.
Metadata Management Software: Examples
Problem
01 02
Additional Software Cost
Additions
Lots of custom metadata
updating code
Cost
Complication
Use Anything That Makes Sense
Good Practice: Where
Making it easy to find
Good Practice: Where
Making it easy to find
Post a link in the README file in the repository containing
the pipeline code
Source
The blue-filled
section shows
where
documentation
and lineage are
extracted from.
Data
Documentation
Principles
Document with a purpose
Who will consume this documentation?
Why do they need this documentation?
How would they like to consume
documentation?
Before you build out your documentation, ask:
ML and Data Professionals
Why do they need this documentation?
How would they like to consume
documentation?
Business Stakeholders
Why do they need this documentation?
How would they like to consume
documentation?
Keep it minimal but effective
Remove any un-needed documentation
Avoid redundancy across documents
Don’t document more than you have to
Be flexible and update the documentation as
you go through the life cycle of the project
Know the project's purpose
Document the customer’s business objectives
Define how your data science project will meet their needs
Set a vision for your project or product so that you can steer the
team in the right direction
Define clear evaluation metrics so that you can objectively
determine whether the project was successful
Conduct a cost-benefit analysis can help determine project
go/no-go and prioritization against other potential projects
Document what you are not looking to accomplish
Data Pipeline
Documentation
Document the data
What data is being used for the model?
Why was this data selected (and other data sets excluded)?
How was the data obtained?
What are known issues in the data?
What does the data look like? (mean, median, mode, skewness, data volume, etc.)
How did you alter the data (transformations, imputations, other data cleaning techniques
applied, etc.)
Where is the data located?
How frequently is the data refreshed?
Is the data usage compliant with user agreements, data privacy best practices, and
relevant regulations? (if not, don’t use it)
What security protections do you have data-at-rest and data-in-motion to ensure
compliance and data privacy?
Document the code
Can you understand your code after an year?
Will another person understand your code if you leave the team?
Build user documentation
Don’t forget your users!
Rather, be sure they know how to use your system.
Data Documentation
Templates
Depends on Data Lifecycle
CRISP-DM (CRoss Industry Standard Process for
Data Mining)
Microsoft’s TDSP (Team Data Science Process)
Usually a custom life-cycle your company
follows
CRISP-DM
Source
CRISP-DM
Process
Phases of CRISP-DM
Business understanding – What does the business need?
Data understanding – What data do we have/need? Is it
clean?
Data preparation – How do we organize the data for
modeling?
Modeling – What modeling techniques should we apply?
Evaluation – Which model best meets the business
objectives?
Deployment – How do stakeholders access the results?
Business Understanding
Determine business objectives
Assess situation
Determine data mining goals
Produce project plan
Data Understanding
Collect initial data
Describe data
Explore data
Verify data quality
Data Preparation
Select data
Clean data
Construct data
Integrate data
Format data
Modeling
Select modeling techniques
Generate test design
Build model
Assess model
Evaluation
Evaluate results
Review process
Determine next steps
Deployment
Plan deployment
Plan monitoring and maintenance
Produce final report
Review project
References
https://www.linkedin.com/pulse/data-lake-documentation-engineer-hands-on-simon-caruana/
https://www.mikulskibartosz.name/documenting-data-pipelines/
https://github.com/tylerwmarrs/data-engineering-project-doc-templates/tree/master/templates
https://towardsdatascience.com/data-documentation-best-practices-3e1a97cfeda6
https://about.gitlab.com/handbook/business-technology/data-team/documentation/
https://www.datascience-pm.com/documentation-best-practices/
https://www.sv-europe.com/crisp-dm-methodology/
https://github.com/patiegm/Datasci_Resources/blob/master/CRISP-DM%20Analysis%20Template.ipynb
https://web.archive.org/web/20220401041957/https://www.the-modeling-agency.com/crisp-dm.pdf
THANK YOU

More Related Content

Similar to Technical Documentation 101 for Data Engineers.pdf

Data management plan template
Data management plan templateData management plan template
Data management plan template
501 Commons
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
ranjit banshpal
 
Enterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsEnterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial Institutions
Sheldon McCarthy
 

Similar to Technical Documentation 101 for Data Engineers.pdf (20)

Enterprise Data Management - Audit and Evolve_Workshop2.pptx
Enterprise Data Management - Audit and Evolve_Workshop2.pptxEnterprise Data Management - Audit and Evolve_Workshop2.pptx
Enterprise Data Management - Audit and Evolve_Workshop2.pptx
 
Critical Success Factors
Critical Success FactorsCritical Success Factors
Critical Success Factors
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
Database 1 Introduction
Database 1   IntroductionDatabase 1   Introduction
Database 1 Introduction
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
Ibm test data_management_v0.4
Ibm test data_management_v0.4Ibm test data_management_v0.4
Ibm test data_management_v0.4
 
Data Science
Data ScienceData Science
Data Science
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
E05WAREH1.PPT
E05WAREH1.PPTE05WAREH1.PPT
E05WAREH1.PPT
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
 
Database 2 External Schema
Database 2   External SchemaDatabase 2   External Schema
Database 2 External Schema
 
Data management plan template
Data management plan templateData management plan template
Data management plan template
 
Mis module ii
Mis module iiMis module ii
Mis module ii
 
Dc2010 fanning
Dc2010 fanningDc2010 fanning
Dc2010 fanning
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
Enterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsEnterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial Institutions
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and Business
 

More from Shristi Shrestha

More from Shristi Shrestha (20)

Introduction to Jira Service Management.pdf
Introduction to Jira Service Management.pdfIntroduction to Jira Service Management.pdf
Introduction to Jira Service Management.pdf
 
Introduction to Confluence.pdf
Introduction to Confluence.pdfIntroduction to Confluence.pdf
Introduction to Confluence.pdf
 
Introduction to Jira.pdf
Introduction to Jira.pdfIntroduction to Jira.pdf
Introduction to Jira.pdf
 
Communication-for-organizations.pdf
Communication-for-organizations.pdfCommunication-for-organizations.pdf
Communication-for-organizations.pdf
 
Time Management.pdf
Time Management.pdfTime Management.pdf
Time Management.pdf
 
Intro-to-Agile-Methodologies.pdf
Intro-to-Agile-Methodologies.pdfIntro-to-Agile-Methodologies.pdf
Intro-to-Agile-Methodologies.pdf
 
Leveraging Design Thinking to Elevate User Experience on WordPress Websites.pdf
Leveraging Design Thinking to Elevate User Experience on WordPress Websites.pdfLeveraging Design Thinking to Elevate User Experience on WordPress Websites.pdf
Leveraging Design Thinking to Elevate User Experience on WordPress Websites.pdf
 
Agile Methodologies.pdf
Agile Methodologies.pdfAgile Methodologies.pdf
Agile Methodologies.pdf
 
Design Thinking for Product Design Slide.pdf
Design Thinking for Product Design  Slide.pdfDesign Thinking for Product Design  Slide.pdf
Design Thinking for Product Design Slide.pdf
 
AN ACTIVE POWER CONTROL STRATEGY FOR HYBRID MICRO-HYDRO AND PHOTOVOLTAIC MICR...
AN ACTIVE POWER CONTROL STRATEGY FOR HYBRID MICRO-HYDRO AND PHOTOVOLTAIC MICR...AN ACTIVE POWER CONTROL STRATEGY FOR HYBRID MICRO-HYDRO AND PHOTOVOLTAIC MICR...
AN ACTIVE POWER CONTROL STRATEGY FOR HYBRID MICRO-HYDRO AND PHOTOVOLTAIC MICR...
 
DevFest 2020 - GHC Presentation.pdf
DevFest 2020 - GHC Presentation.pdfDevFest 2020 - GHC Presentation.pdf
DevFest 2020 - GHC Presentation.pdf
 
ANN Case Study.pdf
ANN Case Study.pdfANN Case Study.pdf
ANN Case Study.pdf
 
Search Engine Optimization 101.pdf
Search Engine Optimization 101.pdfSearch Engine Optimization 101.pdf
Search Engine Optimization 101.pdf
 
Off-page SEO.pdf
Off-page SEO.pdfOff-page SEO.pdf
Off-page SEO.pdf
 
Keywords and Keyword Research.pdf
Keywords and Keyword Research.pdfKeywords and Keyword Research.pdf
Keywords and Keyword Research.pdf
 
On Page SEO.pdf
On Page SEO.pdfOn Page SEO.pdf
On Page SEO.pdf
 
Technical SEO.pdf
Technical SEO.pdfTechnical SEO.pdf
Technical SEO.pdf
 
Rent a Bike - Print Pitch.pdf
Rent a Bike - Print Pitch.pdfRent a Bike - Print Pitch.pdf
Rent a Bike - Print Pitch.pdf
 
Alternative Website Idea Deck.pdf
Alternative Website Idea Deck.pdfAlternative Website Idea Deck.pdf
Alternative Website Idea Deck.pdf
 
Designing Pitch Deck Presentation for Hult.pdf
Designing Pitch Deck Presentation for Hult.pdfDesigning Pitch Deck Presentation for Hult.pdf
Designing Pitch Deck Presentation for Hult.pdf
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 

Technical Documentation 101 for Data Engineers.pdf