SlideShare a Scribd company logo

Streamlining Data Science Workflows with a Feature Catalog

Streamlining Data Science Workflows with a Feature Catalog

1 of 21
Download to read offline
Streamlining
Data Science
Workflows
with a Feature Catalog
Roel Bertens
Roel Bertens
Principal Data Scientist
What is the problem?
The Challenges
of Custom
Model Pipelines
The risk of different defintions
Do you recognize this?
Solution?

Recommended

Workshop - The Little Pattern That Could.pdf
Workshop - The Little Pattern That Could.pdfWorkshop - The Little Pattern That Could.pdf
Workshop - The Little Pattern That Could.pdfTobiasGoeschel
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDavid Portnoy
 
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013Mack Hardy
 
Opticon18: Developer Night
Opticon18: Developer NightOpticon18: Developer Night
Opticon18: Developer NightOptimizely
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Tyler Nguyen
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
An Introduction to Microservices
An Introduction to MicroservicesAn Introduction to Microservices
An Introduction to MicroservicesAd van der Veer
 

More Related Content

Similar to Streamlining Data Science Workflows with a Feature Catalog

hcp-as-continuous-integration-build-artifact-storage-system
hcp-as-continuous-integration-build-artifact-storage-systemhcp-as-continuous-integration-build-artifact-storage-system
hcp-as-continuous-integration-build-artifact-storage-systemIngrid Fernandez, PhD
 
Why real integration developers ride Camels
Why real integration developers ride CamelsWhy real integration developers ride Camels
Why real integration developers ride CamelsChristian Posta
 
Working Software Over Comprehensive Documentation
Working Software Over Comprehensive DocumentationWorking Software Over Comprehensive Documentation
Working Software Over Comprehensive DocumentationAndrii Dzynia
 
Choosing right-automation-tool
Choosing right-automation-toolChoosing right-automation-tool
Choosing right-automation-toolBabuDevanandam
 
APIdays Helsinki 2019 - How API Will Help Win the Deals - the Case of Infrast...
APIdays Helsinki 2019 - How API Will Help Win the Deals - the Case of Infrast...APIdays Helsinki 2019 - How API Will Help Win the Deals - the Case of Infrast...
APIdays Helsinki 2019 - How API Will Help Win the Deals - the Case of Infrast...apidays
 
Accelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAIAccelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAIFeatureByte
 
Enterprise Library 2.0
Enterprise Library 2.0Enterprise Library 2.0
Enterprise Library 2.0Raju Permandla
 
SpagoBI_CLLAP2009
SpagoBI_CLLAP2009SpagoBI_CLLAP2009
SpagoBI_CLLAP2009guest76d50b
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on DatabricksDataScienceConferenc1
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Uday Kothari
 
Real-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationReal-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationPatrick Van Renterghem
 
openCPQ - A React-Based Product-Configuration Toolkit
openCPQ - A React-Based Product-Configuration ToolkitopenCPQ - A React-Based Product-Configuration Toolkit
openCPQ - A React-Based Product-Configuration ToolkitTim Geisler
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
Backstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptxBackstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptxBrandenTimm1
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesHadi Ariawan
 
#1 - HTML5 Overview
#1 - HTML5 Overview#1 - HTML5 Overview
#1 - HTML5 Overviewiloveigloo
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with BackstageOpsta
 
Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Enkitec
 

Similar to Streamlining Data Science Workflows with a Feature Catalog (20)

hcp-as-continuous-integration-build-artifact-storage-system
hcp-as-continuous-integration-build-artifact-storage-systemhcp-as-continuous-integration-build-artifact-storage-system
hcp-as-continuous-integration-build-artifact-storage-system
 
Why real integration developers ride Camels
Why real integration developers ride CamelsWhy real integration developers ride Camels
Why real integration developers ride Camels
 
Working Software Over Comprehensive Documentation
Working Software Over Comprehensive DocumentationWorking Software Over Comprehensive Documentation
Working Software Over Comprehensive Documentation
 
Choosing right-automation-tool
Choosing right-automation-toolChoosing right-automation-tool
Choosing right-automation-tool
 
APIdays Helsinki 2019 - How API Will Help Win the Deals - the Case of Infrast...
APIdays Helsinki 2019 - How API Will Help Win the Deals - the Case of Infrast...APIdays Helsinki 2019 - How API Will Help Win the Deals - the Case of Infrast...
APIdays Helsinki 2019 - How API Will Help Win the Deals - the Case of Infrast...
 
Accelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAIAccelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAI
 
Enterprise Library 2.0
Enterprise Library 2.0Enterprise Library 2.0
Enterprise Library 2.0
 
SpagoBI_CLLAP2009
SpagoBI_CLLAP2009SpagoBI_CLLAP2009
SpagoBI_CLLAP2009
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 
Real-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationReal-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse Automation
 
openCPQ - A React-Based Product-Configuration Toolkit
openCPQ - A React-Based Product-Configuration ToolkitopenCPQ - A React-Based Product-Configuration Toolkit
openCPQ - A React-Based Product-Configuration Toolkit
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
Backstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptxBackstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptx
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by Examples
 
#1 - HTML5 Overview
#1 - HTML5 Overview#1 - HTML5 Overview
#1 - HTML5 Overview
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12
 

More from GoDataDriven

Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenGoDataDriven
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationGoDataDriven
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerGoDataDriven
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...GoDataDriven
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022GoDataDriven
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022GoDataDriven
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022GoDataDriven
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022GoDataDriven
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022GoDataDriven
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanGoDataDriven
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesGoDataDriven
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...GoDataDriven
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...GoDataDriven
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofGoDataDriven
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 
The world runs on AI - Tony Krijnen (Microsoft) at GoDataFest 2019
The world runs on AI - Tony Krijnen (Microsoft) at GoDataFest 2019The world runs on AI - Tony Krijnen (Microsoft) at GoDataFest 2019
The world runs on AI - Tony Krijnen (Microsoft) at GoDataFest 2019GoDataDriven
 
CI/CD with Azure DevOps and Azure Databricks
CI/CD with Azure DevOps and Azure DatabricksCI/CD with Azure DevOps and Azure Databricks
CI/CD with Azure DevOps and Azure DatabricksGoDataDriven
 

More from GoDataDriven (20)

Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
The world runs on AI - Tony Krijnen (Microsoft) at GoDataFest 2019
The world runs on AI - Tony Krijnen (Microsoft) at GoDataFest 2019The world runs on AI - Tony Krijnen (Microsoft) at GoDataFest 2019
The world runs on AI - Tony Krijnen (Microsoft) at GoDataFest 2019
 
CI/CD with Azure DevOps and Azure Databricks
CI/CD with Azure DevOps and Azure DatabricksCI/CD with Azure DevOps and Azure Databricks
CI/CD with Azure DevOps and Azure Databricks
 

Recently uploaded

HayleyDerby_Market_Research_Spotify.docx
HayleyDerby_Market_Research_Spotify.docxHayleyDerby_Market_Research_Spotify.docx
HayleyDerby_Market_Research_Spotify.docxHayleyDerby
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?Denodo
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
Basics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelBasics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelTope Osanyintuyi
 
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
Artificial Intelligence for Vision:  A walkthrough of recent breakthroughsArtificial Intelligence for Vision:  A walkthrough of recent breakthroughs
Artificial Intelligence for Vision: A walkthrough of recent breakthroughsNikolas Markou
 
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDFEXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDFProject Cubicle
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxPoonamRijal
 
itc limited word file.pdf...............
itc limited word file.pdf...............itc limited word file.pdf...............
itc limited word file.pdf...............mahetamanav24
 
WOMEN IN TECH EVENT : Explore Salesforce Metadata.pptx
WOMEN IN TECH EVENT : Explore Salesforce Metadata.pptxWOMEN IN TECH EVENT : Explore Salesforce Metadata.pptx
WOMEN IN TECH EVENT : Explore Salesforce Metadata.pptxyosra Saidani
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdfruwanp2000
 
Choose your perfect jacket.pdf
Choose your perfect jacket.pdfChoose your perfect jacket.pdf
Choose your perfect jacket.pdfAlexia Trejo
 
Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...DrSumathyV
 
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...Samuel Chukwuma
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...ThinkInnovation
 
Ratio analysis, Formulas, Advantage PPt.pptx
Ratio analysis, Formulas, Advantage PPt.pptxRatio analysis, Formulas, Advantage PPt.pptx
Ratio analysis, Formulas, Advantage PPt.pptxSugumarVenkai
 
introduction-to-crimean-congo-haemorrhagic-fever.pdf
introduction-to-crimean-congo-haemorrhagic-fever.pdfintroduction-to-crimean-congo-haemorrhagic-fever.pdf
introduction-to-crimean-congo-haemorrhagic-fever.pdfSalamaAdel
 

Recently uploaded (18)

HayleyDerby_Market_Research_Spotify.docx
HayleyDerby_Market_Research_Spotify.docxHayleyDerby_Market_Research_Spotify.docx
HayleyDerby_Market_Research_Spotify.docx
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
Basics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelBasics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft Excel
 
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
Artificial Intelligence for Vision:  A walkthrough of recent breakthroughsArtificial Intelligence for Vision:  A walkthrough of recent breakthroughs
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
 
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDFEXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptx
 
itc limited word file.pdf...............
itc limited word file.pdf...............itc limited word file.pdf...............
itc limited word file.pdf...............
 
WOMEN IN TECH EVENT : Explore Salesforce Metadata.pptx
WOMEN IN TECH EVENT : Explore Salesforce Metadata.pptxWOMEN IN TECH EVENT : Explore Salesforce Metadata.pptx
WOMEN IN TECH EVENT : Explore Salesforce Metadata.pptx
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
Choose your perfect jacket.pdf
Choose your perfect jacket.pdfChoose your perfect jacket.pdf
Choose your perfect jacket.pdf
 
Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...
 
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Ratio analysis, Formulas, Advantage PPt.pptx
Ratio analysis, Formulas, Advantage PPt.pptxRatio analysis, Formulas, Advantage PPt.pptx
Ratio analysis, Formulas, Advantage PPt.pptx
 
introduction-to-crimean-congo-haemorrhagic-fever.pdf
introduction-to-crimean-congo-haemorrhagic-fever.pdfintroduction-to-crimean-congo-haemorrhagic-fever.pdf
introduction-to-crimean-congo-haemorrhagic-fever.pdf
 

Streamlining Data Science Workflows with a Feature Catalog

  • 1. Streamlining Data Science Workflows with a Feature Catalog Roel Bertens
  • 3. What is the problem?
  • 4. The Challenges of Custom Model Pipelines The risk of different defintions
  • 7. Feature Catalog The Solution to Organized and Efficient Feature Computation A way to structure and centralize your feature logic code. Preferably with these goals in mind: • User-friendly (easy to extend and to use) • Group / reuse logic • Balance flexibility and speed • Autogenerate docs and diagrams My definition
  • 8. Feature Catalog The Solution to Organized and Efficient Feature Computation
  • 10. Benefits of a Feature Catalog Single source of truth Iteration speed Efficient computation Quality Collaboration Re-usable documentation Consistency PoC and PROD
  • 11. What is the difference with a Feature Store?
  • 12. Feature Catalog vs Feature Store vs Feature Platform Source: https://huyenchip.com/2023/01/08/self-serve-feature-platforms.html
  • 13. Without … … and with Feature Store
  • 14. Do you need a Feature Store?
  • 15. A Feature Store is the possible next step Easy to integrate on any platform Features computed on demand (slow) Only compute what is required (cheap) Single use (no caching by the catalog itself) Feature Catalog Requires a more complex architecture Features precomputed (quick) Compute everything (expensive) Multiple use (cheap) Feature Store
  • 16. How does can a Feature Catalog look?
  • 17. Kickstart your Feature Catalog with this template Simple to use. Define features once and use them on multiple aggregation levels. Feature groups can builld on top of each other without redefining or recomputing. Don’t worry about loading all necessary tables, that is done for you. Only specify the feature names of interest. https://xebia.ai/catalog-code
  • 18. How does it compare to … ?
  • 19. Feature Catalog template: https://xebia.ai/catalog-code An example of how to structure your feature catalog using spark. flexible only a starting point (you still need to do the work) Featuretools: https://github.com/alteryx/featuretools A python library for automated feature engineering. lot of functionality out of the box no complex features (will only fit limited set of use cases) dbt Semantic Layer: https://www.getdbt.com/product/semantic-layer Designed for core business metrics where consistency and precision are of key importance. lot of functionality out of the box focus on metrics not features There are different tools out there, what to use?
  • 20. Blog: https://xebia.ai/catalog Summary Avoid confusion and duplication Create your own Feature Catalog Increase collaboration and quality Launch experiments and models faster Github: https://xebia.ai/catalog-code
  • 21. Disclaimer Whilst every care has been taken by Xebia to ensure that the information contained in this document is correct and complete, it is possible that this is not the case. Xebia provides the information "as is", without any warranty for its soundness, suitability for a different purpose or otherwise. Xebia is not liable for any damage which has occurred or may occur as a result of or in any respect related to the use of this information. Xebia may change or terminate this document at any time without further notice and shall not be responsible for any consequence(s) arising there from. Subject to this disclaimer, Xebia is not responsible for any contributions by third parties to this information. Copyright Notice Copyright © Xebia Nederland B.V., Laapersveld 27, 1213 VB, Hilversum, The Netherlands. All rights reserved. Xebia® is a registered trademark of Xebia Holding B.V. internationally. All other company references may be trademarks and/or service marks of their respective owners.