Hopsworks Feature Store 2.0 a new paradigm

Hopsworks
Feature Store 2.0,
a new paradigm
Jim Dowling
Logical Clocks
2020-12-14
1st Global Feature Stores
for ML Meetup

Growing Consensus on how to manage complexity of AI
Feature Store Online
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Store Oﬄine
Feature
Engineering
Connectors
to External
Data Sources
Data Model Prediction
φ(x)
2

Growing Consensus on how to manage complexity of AI
Data validation
Distributed
ENGINEER
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data
Collection
Hardware
Management
Data Model Prediction
φ(x)
ML PLATFORM
TRAIN and SERVE
FEATURE
STORE

End-to-End ML Pipelines and the Feature Store
Data Lake,
Warehouse,
Kafka
Feature
Store
Model
registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Features
Validate
Retrieve Feature Values

End-to-End ML Pipelines and the Feature Store with CI/CD
Code and
conﬁguration
Data Lake,
Warehouse,
Kafka
Feature
Store
Model
registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Model
Monitoring
Experiments/
Development
Features
Validate
Log Predictions, Retrieve Feature Statistics for Data Drift Detection

End-to-End ML Pipelines and the Feature Store with CI/CD and Provenance
Code and
conﬁguration
Data Lake,
Warehouse,
Kafka
Feature
Store
Model
registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Model
Monitoring
Experiments/
Development
Scaleout
Metadata
Features
Validate
Log Predictions, Retrieve Feature Statistics for Data Drift Detection
Elasticsearch
Sync

Hopsworks Feature Store Concepts: Features, Feature Groups, and Training Datasets
Features name Pclass Sex Survive Name Balance
Feature
Groups
Titanic
Passenger List
Passenger
Bank Account

Training
Datasets
Survivename PClass Sex Balance
Join
Feature
Groups
Titanic
Passenger List
Passenger
Bank Account

Training
Datasets
Survivename PClass Sex Balance
Join
Feature
Groups
Titanic
Passenger List
Passenger
Bank Account
File format
.tfrecord
.npy
.csv
.hdf5,
.petastorm,
etc
Storage
Azure
S3
HopsFS

Features are created/updated at different cadences
Click features every 10 secs
CDC data every 30 secs
User proﬁle updates every hour
Featurized weblogs data every day
Online
Feature
Store
Oﬄine
Feature
Store
SQL DW
S3, HDFS
SQL
Event Data
Real-Time Data
User-Entered Features (<2 secs) Online
App
Low
Latency
Features
High
Latency
Features
Train,
Batch App
Feature Store
<10ms
TBs/PBs

FeatureGroup Ingestion in Hopsworks
Feature Store
ClickFeatureGroup
TableFeatureGroup
UserFeatureGroup
LogsFeatureGroup
Event Data
SQL DW
S3, HDFS
SQL
DataFrameAPI
Kafka Input
RTFeatureGroup
Online
App
Train,
Batch App
User Clicks
DB Updates
User Proﬁle Updates
Weblogs
Hof: Real-time feature
Engineering
Kafka Output

Hopsworks Feature Store V1 API
First Feature Store with a General Purpose DataFrame API
Feature Store is a cache for materialized features, not a library.
Online and Oﬄine Feature Stores to support low latency and scale, respectively
Reuse of Features means JOINS – Spark as a join engine

Hopsworks Feature Store V2 API
Enforce feature-group scope and schema+data versioning as best practice
Better support for multiple feature stores - join features from development and
production feature stores
Better support for complex joins of features
First class API support for time-travel
Support any Python or Spark client with a single library

Example Ingestion of data into a FeatureGroup
https://docs.hopsworks.ai/
dataframe = spark.read.json("s3://dataset/rain.json")
# do feature engineering on your dataframe
df.withColumn('precipitation', (df.val-min)/(max-min))
fg = fs.create_feature_group("rain",
version=1,
description="Rain features",
primary_key=['date', 'location_id'],
online_enabled=True)
fg.save(dataframe)
fg.add_tag(name=“ingestion, value=“Databricks:jim; Pii;notebook.ipynb”)

# Join features across FeatureGroups. Use “on=[..]” to explicitly enter the JOIN
key.
feature_join = rain_fg.select_all()
.join(temperature_fg.select_all(), on=["date", "location_id"])
.join(location_fg.select_all()))
sc = fs.get_storage_connector("myBucket", "S3")
td = fs.create_training_dataset("training_dataset", version=1,
storage_connector=sc,
data_format="tfrecords",
description="Training dataset, TfRecords format",
splits={'train': 0.7, 'test': 0.2, 'validate':
0.1})
td.save(feature_join)
# When training a model, read the training data (use “test” to read test data):
ds = td.read(split="train")
Example Creation of Train/Test Data from a Feature Store

FeatureGroup Time-Travel
fg.insert(upsert_df)
fg.commit_details()
df = fs.get_feature_group(“rain”, 1)
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
Timestamp2
... ...
... ...
Commitn
Timestampn
Feature Group (v1)

fg.commit_details()
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
Timestamp2
... ...
... ...
Commitn
Timestampn
show
log
Feature Group (v1)

FeatureGroup Schema Versioning
fg.commit_details()
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
Timestamp2
... ...
... ...
Commitn
Timestampn
Feature Group (v1)
Feature Group (v2)
latest
commit
of
schema
(v1)

fg.commit_details()
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
Timestamp2
... ...
Commitn-1
Commitn
Timestampn
2020-12-15
09:00:01
Feature Group (v1)

fg.commit_details()
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
2020-12-14
09:00:01
... ...
Commitn-1
Commitn
Timestampn
2020-12-15
09:00:01
Feature Group (v1)

github.com/logicalclocks/hopsworks
-
@logicalclocks
-
www.logicalclocks.com

Hopsworks Feature Store 2.0 a new paradigm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hopsworks Feature Store 2.0 a new paradigm

Similar to Hopsworks Feature Store 2.0 a new paradigm (20)

More from Jim Dowling

More from Jim Dowling (20)

Recently uploaded

Recently uploaded (20)

Hopsworks Feature Store 2.0 a new paradigm