Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

Confidential and Proprietary to Daugherty Business Solutions
Feature Store Overview
Adam Doyle
St. Louis Big Data IDEA
August 2020

The Data Science Process

“A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is
used both as input to models during training and when models are served in production.”
Key takeaways
• Features are not data
• Features enumerate information
• Not all features are equal
Features
https://docs.feast.dev/user-guide/features

Feature Engineering is the process of extracting features from raw data.
Feature Engineering Techniques
• Imputation
• Handling Outliers
• Binning
• Numerical Transform
• One-Hot Encoding
• Grouping
• Extraction
• Scaling
Feature Engineering

• Feature Reuse Between Models
• Consistent Feature Definitions
• Latency / Recency
• Environmental Variation
• Unstable Dependencies
• Governance
• Versioning
Feature Challenges

Feature Store
API
Metadata /
Model /
Predictions
Offline
Data Store
Online
Data Store
Batch Engine
Stream Engine
Batch Prediction
Stream Prediction

• Retrieve Feature Metadata
• Retrieve Feature Values
• Remove Features
• Store Features
• Stream Store Features
• Stream Retrieve Features
• Feature Versioning
• Model Versioning
• Record Predictions
Feature Store Use Cases

• Data engineers interact with a feature store by creating
data pipeline definitions.
• Data pipeline definitions combine
– Data Sources
– Business definitions
– Transformation rule
– Streaming/Batch definitions
– Scheduling
• Data pipelines are executed by the feature store engines
and stored in online and offline data stores.
Data Pipeline

• Data scientists interact with the feature store through the Feature Registry.
• They can search for and browse feature definitions.
• They can register data science models as a class of data pipeline.
Feature Registry

• Feature stores can assist with versioning and monitoring data
science applications.
• Predictions are recorded in the feature store API including
source data, model used, version of that model, and the
rendered prediction.
• Predictions can be compared with reality to determine the
accuracy of the models.
• Models and versions are tracked and can be used to determine
the lift provided by a particular instance of a model.
Versioning and Monitoring

• Open Source
– GoJEK/Google FEAST
• Product Offerings
– Logical Clocks Hopsworks
– Scribble Enrich
• Presentations Only
– Uber Michaelangelo
– Airbnb Zipline
– Survey Monkey ML Feature Store
– Netflix MetaFlow
Feature Store Implementations

• http://featurestore.org/
• https://www.scribbledata.io/resources-feature-store-guide
• https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
• https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8
• https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science-
3f9156f7ab4
• https://www.logicalclocks.com/hopsworks-featurestore
• https://eng.uber.com/michelangelo-machine-learning-platform/
• https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service
• https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for-
machine-learning
• https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform
• https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i
• https://databricks.com/session/fact-store-scale-for-netflix-recommendations
• https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0
Links

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

Similar to Feature store Overview St. Louis Big Data IDEA Meetup aug 2020 (20)

More from Adam Doyle

More from Adam Doyle (20)

Recently uploaded

Recently uploaded (20)

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020