SlideShare a Scribd company logo
1 of 47
Download to read offline
Feature Store: the missing data layer in ML pipelines?1
Hopsworks Hands On - Palo Alto
Kim Hammar
kim@logicalclocks.com
April 23, 2019
1
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines?
https://www.logicalclocks.com/feature-store/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 1 / 22
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 2 / 22
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
?
_
_("))_/
_
2
Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.
https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 2 / 22
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
?
_
_("))_/
_
“Data is the hardest part of ML and the most important piece to
get right.
Modelers spend most of their time selecting and transforming
features at training time and then building the pipelines to deliver
those features to production models.”
- Uber2
2
Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.
https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 2 / 22
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆyFeature Store
“Data is the hardest part of ML and the most important piece to
get right.
Modelers spend most of their time selecting and transforming
features at training time and then building the pipelines to deliver
those features to production models.”
- Uber3
3
Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.
https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 2 / 22
Merging Our Data Intensive and Compute Intensive
Workloads
Compute IntensiveData Intensive
w(t)
Σ
Σ
Σ
Σ
σ
σ
σ
σ
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 3 / 22
Merging Our Data Intensive and Compute Intensive
Workloads
Compute IntensiveData Intensive
Feature Store
w(t)
Σ
Σ
Σ
Σ
σ
σ
σ
σ
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 3 / 22
Outline
1 What is a Feature Store
2 Why You Need a Feature Store
3 How to Build a Feature Store (Hopsworks Feature Store)
4 Demo
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 4 / 22
Solution: Disentangle ML Pipelines
with a Feature Store
Raw/Structured Data
Feature Store
Feature Engineering Training
Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
A feature store is a central vault for storing documented, curated, and
access-controlled features.
The feature store is the interface between data engineering and data
model development
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 5 / 22
Make ML-Features A First-Class Citizen in Your Data Lakes
Traditional Feature Engineering
N spark tasks Derived data Model Training
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Raw data lake
Feature Engineering w Feature Store
N spark tasks Derived data Model Training
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Raw data lake
Feature Store
Metadata
...
Make your features first-class citizens:
Document features
Version features
Invest in a data layer specifically for features (feature store)
Make features access-controlled and searchable
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 6 / 22
What is a Feature?
A feature is a measurable property of some data-sample
A feature could be..
An aggregate value (min, max, mean, sum)
A raw value (a pixel, a word from a piece of text)
A value from a database table (the age of a customer)
A derived representation: e.g an embedding or a cluster
Features are the fuel for AI systems:




x1
...
xn




Features
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 7 / 22
Feature Engineering is Crucial for Model Performance
x1• • • • • •◦ ◦ ◦
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 8 / 22
Feature Engineering is Crucial for Model Performance
x1
x2
•
•
• •
•
•
◦
◦
◦
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 9 / 22
Feature Engineering is Crucial for Model Performance
x1
x2
•
•
• •
•
•
◦
◦
◦
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 10 / 22
Feature Engineering is Complex
Input Data
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
How do you make this scale?
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
How do you make this scale? How to manage the feature pipelines?
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
How do you make this scale? How to manage the feature pipelines?
How to make features reusable and robust?
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
How do you make this scale? How to manage the feature pipelines?
How to make features reusable and robust?
Feature Engineering is Complex Yet Crucial for Model Performance
Treat your features accordingly!
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
Feature Pipeline Jungles
Data Lake (Raw/Structured Data)
Feature Data (Derived Data)
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 12 / 22
Disentangle Your ML Pipelines with a Feature Store
Data Sources Dataset 1 Dataset 2 . . . Dataset n
Feature Store
Feature Store
A data management platform for machine learning.
The interface between data engineering and data science.
Models
Models are trained using sets of features.
The features are fetched from the feature store
and can overlap between models.
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
Disentangle Your ML Pipelines with a Feature Store
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
Disentangle Your ML Pipelines with a Feature Store
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
Disentangle Your ML Pipelines with a Feature Store
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
Versioning
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
Disentangle Your ML Pipelines with a Feature Store
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
Versioning
Documentation
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
High-Level APIs and Abstractions
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
featurestore.create_featuregroup(
f_df, "t_features",
description="...", version=2)
d_dir = featurestore.get_training_dataset_path(td_name)
tf_schema = featurestore.get_tf_record_schema(td_name)
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 14 / 22
High-Level APIs and Abstractions
Read from the feature store
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
featurestore.create_featuregroup(
f_df, "t_features",
description="...", version=2)
d_dir = featurestore.get_training_dataset_path(td_name)
tf_schema = featurestore.get_tf_record_schema(td_name)
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 14 / 22
High-Level APIs and Abstractions
Read from the feature store
Write to the feature store
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
featurestore.create_featuregroup(
f_df, "t_features",
description="...", version=2)
d_dir = featurestore.get_training_dataset_path(td_name)
tf_schema = featurestore.get_tf_record_schema(td_name)
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 14 / 22
High-Level APIs and Abstractions
Read from the feature store
Write to the feature store
Metadata operations
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
featurestore.create_featuregroup(
f_df, "t_features",
description="...", version=2)
d_dir = featurestore.get_training_dataset_path(td_name)
tf_schema = featurestore.get_tf_record_schema(td_name)
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 14 / 22
Existing Feature Stores
Uber’s feature store4
Airbnb’s feature store5
Comcast’s feature store6
Facebook’s feature store7
GO-JEK’s feature store8
Twitter’s feature store9
Branch International’s feature store10
Hopsworks’ feature store11 (the only open-source one!)
4
Li Erran Li et al. “Scaling Machine Learning as a Service”. In: Proceedings of The 3rd International
Conference on Predictive Applications and APIs. Ed. by Claire Hardgrove et al. Vol. 67. Proceedings of Machine
Learning Research. Microsoft NERD, Boston, USA: PMLR, 2017, pp. 14–29. URL:
http://proceedings.mlr.press/v67/li17a.html.
5
Nikhil Simha and Varant Zanoyan. Zipline: Airbnb’s Machine Learning Data Management Platform.
https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform. 2018.
6
Nabeel Sarwar. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions.
https://databricks.com/session/operationalizing-machine-learning-managing-provenance-from-raw-data-to-
predictions. 2018.
7
Kim Hazelwood et al. “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”. In:
Feb. 2018, pp. 620–629. DOI: 10.1109/HPCA.2018.00059.
8
Willem Pienaar. Building a Feature Platform to Scale Machine Learning | DataEngConf BCN ’18.
https://www.youtube.com/watch?v=0iCXY6VnpCc. 2018.
9Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 15 / 22
The Components of a Feature Store
Feature Storage
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 16 / 22
The Components of a Feature Store
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 16 / 22
The Components of a Feature Store
Client Interface
API
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
Feature Registry
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 16 / 22
Hopsworks Feature Store
Feature Storage
HopsFS
Feature groups Training Datasets
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
Hopsworks Feature Store
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
HopsFS
Feature groups Training Datasets
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
Hopsworks Feature Store
REST API
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
HopsFS
Feature groups Training Datasets
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
Hopsworks Feature Store
Client Interface
API
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
Hopsworks
Feature Registry
REST API
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
HopsFS
Feature groups Training Datasets
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
Hopsworks Feature Store
Client Interface
API
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
Hopsworks
Feature Registry
REST API
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
HopsFS
Feature groups Training Datasets
e1
e2
e3
e4
Gradient
Gradient
Gradient
Gradient
Model Training/Serving
Feature Engineering
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
Hopsworks Feature Store
Data Sources
HopsFS
Feature Creation Feature Storage
FeatureStoreAPI
Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 18 / 22
Feature Creation Example
[in:5, out:10.5, . . .]
Transactions
Spark
sum_trx_amount
from hops import featurestore
trx_df = spark.read.parquet(..)
trx_sum_amount_df = trx_df.select("amount,customer")
.groupBy("customer")
.agg(sum("amount"))
featurestore.create_featuregroup(
trx_sum_amount_df ,
"trx_sum_amount",
description="sum of transactions"
)
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 19 / 22
Feature
Computation
Raw/Structured Data
Data Lake
Feature Store
Curated Features
Model
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Demo-Setting
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 20 / 22
Summary
Machine learning comes with a high technical cost
Machine learning pipelines needs proper data management
A feature store is a place to store curated and documented features
The feature store serves as an interface between feature engineering
and model development, it can help disentangle complex ML pipelines
Hopsworks12 provides the world’s first open-source feature store
@hopshadoop
www.hops.io
@logicalclocks
www.logicalclocks.com
We are open source:
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops
13
12
Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018.
13
Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou,
Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan, and
Rasmus Toivonen
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 21 / 22
References
Hopsworks’ feature store14
HopsML15
Hopsworks16
14
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines?
https://www.logicalclocks.com/feature-store/. 2018.
15
Logical Clocks AB. HopsML: Python-First ML Pipelines.
https://hops.readthedocs.io/en/latest/hopsml/hopsML.html. 2018.
16
Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 22 / 22

More Related Content

What's hot

Kim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIQAware GmbH
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsAndrzej Michałowski
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)Simba Khadder
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineJan Wiegelmann
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Josef A. Habdank
 

What's hot (20)

Kim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
Introduction to Hivemall
Introduction to HivemallIntroduction to Hivemall
Introduction to Hivemall
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
 

Similar to Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019

Uddeholm ml workshop_hagfors_kim_hammar
Uddeholm ml workshop_hagfors_kim_hammarUddeholm ml workshop_hagfors_kim_hammar
Uddeholm ml workshop_hagfors_kim_hammarKim Hammar
 
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature storeKim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature storeKim Hammar
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature EngineeringBigML, Inc
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering BigML, Inc
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision MakingBigML, Inc
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and ElasticElasticsearch
 
Python for Data Science - Python Brasil 11 (2015)
Python for Data Science - Python Brasil 11 (2015)Python for Data Science - Python Brasil 11 (2015)
Python for Data Science - Python Brasil 11 (2015)Gabriel Moreira
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series BigML, Inc
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Makoto Yui
 
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar
 
Operations is a Strategic Weapon (PuppetConf)
Operations is a Strategic Weapon (PuppetConf)Operations is a Strategic Weapon (PuppetConf)
Operations is a Strategic Weapon (PuppetConf)dev2ops
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBigML, Inc
 
Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Louis Dorard
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Smart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateSmart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateDATAVERSITY
 
Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Jan Wedekind
 

Similar to Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019 (20)

Uddeholm ml workshop_hagfors_kim_hammar
Uddeholm ml workshop_hagfors_kim_hammarUddeholm ml workshop_hagfors_kim_hammar
Uddeholm ml workshop_hagfors_kim_hammar
 
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature storeKim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature Engineering
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
 
FortranCalculus Class
FortranCalculus ClassFortranCalculus Class
FortranCalculus Class
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and Elastic
 
Python for Data Science - Python Brasil 11 (2015)
Python for Data Science - Python Brasil 11 (2015)Python for Data Science - Python Brasil 11 (2015)
Python for Data Science - Python Brasil 11 (2015)
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17
 
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
 
Operations is a Strategic Weapon (PuppetConf)
Operations is a Strategic Weapon (PuppetConf)Operations is a Strategic Weapon (PuppetConf)
Operations is a Strategic Weapon (PuppetConf)
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
 
QGIS training class 2
QGIS training class 2QGIS training class 2
QGIS training class 2
 
Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Smart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateSmart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning Update
 
Metolabs 3Wai offerings
Metolabs 3Wai offeringsMetolabs 3Wai offerings
Metolabs 3Wai offerings
 
Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008
 

More from Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion ResponseKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionKim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security AutomationKim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...Kim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseKim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 

More from Kim Hammar (20)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019

  • 1. Feature Store: the missing data layer in ML pipelines?1 Hopsworks Hands On - Palo Alto Kim Hammar kim@logicalclocks.com April 23, 2019 1 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 1 / 22
  • 2. ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 2 / 22
  • 3. ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy ? _ _("))_/ _ 2 Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo. https://eng.uber.com/scaling-michelangelo/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 2 / 22
  • 4. ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy ? _ _("))_/ _ “Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models.” - Uber2 2 Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo. https://eng.uber.com/scaling-michelangelo/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 2 / 22
  • 5. ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆyFeature Store “Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models.” - Uber3 3 Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo. https://eng.uber.com/scaling-michelangelo/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 2 / 22
  • 6. Merging Our Data Intensive and Compute Intensive Workloads Compute IntensiveData Intensive w(t) Σ Σ Σ Σ σ σ σ σ Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 3 / 22
  • 7. Merging Our Data Intensive and Compute Intensive Workloads Compute IntensiveData Intensive Feature Store w(t) Σ Σ Σ Σ σ σ σ σ Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 3 / 22
  • 8. Outline 1 What is a Feature Store 2 Why You Need a Feature Store 3 How to Build a Feature Store (Hopsworks Feature Store) 4 Demo Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 4 / 22
  • 9. Solution: Disentangle ML Pipelines with a Feature Store Raw/Structured Data Feature Store Feature Engineering Training Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy A feature store is a central vault for storing documented, curated, and access-controlled features. The feature store is the interface between data engineering and data model development Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 5 / 22
  • 10. Make ML-Features A First-Class Citizen in Your Data Lakes Traditional Feature Engineering N spark tasks Derived data Model Training b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Raw data lake Feature Engineering w Feature Store N spark tasks Derived data Model Training b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Raw data lake Feature Store Metadata ... Make your features first-class citizens: Document features Version features Invest in a data layer specifically for features (feature store) Make features access-controlled and searchable Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 6 / 22
  • 11. What is a Feature? A feature is a measurable property of some data-sample A feature could be.. An aggregate value (min, max, mean, sum) A raw value (a pixel, a word from a piece of text) A value from a database table (the age of a customer) A derived representation: e.g an embedding or a cluster Features are the fuel for AI systems:     x1 ... xn     Features b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy) Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 7 / 22
  • 12. Feature Engineering is Crucial for Model Performance x1• • • • • •◦ ◦ ◦ Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 8 / 22
  • 13. Feature Engineering is Crucial for Model Performance x1 x2 • • • • • • ◦ ◦ ◦ Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 9 / 22
  • 14. Feature Engineering is Crucial for Model Performance x1 x2 • • • • • • ◦ ◦ ◦ Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 10 / 22
  • 15. Feature Engineering is Complex Input Data Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 16. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 17. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 18. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 19. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 20. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model How do you make this scale? Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 21. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model How do you make this scale? How to manage the feature pipelines? Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 22. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model How do you make this scale? How to manage the feature pipelines? How to make features reusable and robust? Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 23. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model How do you make this scale? How to manage the feature pipelines? How to make features reusable and robust? Feature Engineering is Complex Yet Crucial for Model Performance Treat your features accordingly! Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 11 / 22
  • 24. Feature Pipeline Jungles Data Lake (Raw/Structured Data) Feature Data (Derived Data) Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 12 / 22
  • 25. Disentangle Your ML Pipelines with a Feature Store Data Sources Dataset 1 Dataset 2 . . . Dataset n Feature Store Feature Store A data management platform for machine learning. The interface between data engineering and data science. Models Models are trained using sets of features. The features are fetched from the feature store and can overlap between models. b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
  • 26. Disentangle Your ML Pipelines with a Feature Store Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
  • 27. Disentangle Your ML Pipelines with a Feature Store Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
  • 28. Disentangle Your ML Pipelines with a Feature Store Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis Versioning Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
  • 29. Disentangle Your ML Pipelines with a Feature Store Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis Versioning Documentation Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 13 / 22
  • 30. High-Level APIs and Abstractions from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) featurestore.create_featuregroup( f_df, "t_features", description="...", version=2) d_dir = featurestore.get_training_dataset_path(td_name) tf_schema = featurestore.get_tf_record_schema(td_name) Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 14 / 22
  • 31. High-Level APIs and Abstractions Read from the feature store from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) featurestore.create_featuregroup( f_df, "t_features", description="...", version=2) d_dir = featurestore.get_training_dataset_path(td_name) tf_schema = featurestore.get_tf_record_schema(td_name) Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 14 / 22
  • 32. High-Level APIs and Abstractions Read from the feature store Write to the feature store from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) featurestore.create_featuregroup( f_df, "t_features", description="...", version=2) d_dir = featurestore.get_training_dataset_path(td_name) tf_schema = featurestore.get_tf_record_schema(td_name) Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 14 / 22
  • 33. High-Level APIs and Abstractions Read from the feature store Write to the feature store Metadata operations from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) featurestore.create_featuregroup( f_df, "t_features", description="...", version=2) d_dir = featurestore.get_training_dataset_path(td_name) tf_schema = featurestore.get_tf_record_schema(td_name) Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 14 / 22
  • 34. Existing Feature Stores Uber’s feature store4 Airbnb’s feature store5 Comcast’s feature store6 Facebook’s feature store7 GO-JEK’s feature store8 Twitter’s feature store9 Branch International’s feature store10 Hopsworks’ feature store11 (the only open-source one!) 4 Li Erran Li et al. “Scaling Machine Learning as a Service”. In: Proceedings of The 3rd International Conference on Predictive Applications and APIs. Ed. by Claire Hardgrove et al. Vol. 67. Proceedings of Machine Learning Research. Microsoft NERD, Boston, USA: PMLR, 2017, pp. 14–29. URL: http://proceedings.mlr.press/v67/li17a.html. 5 Nikhil Simha and Varant Zanoyan. Zipline: Airbnb’s Machine Learning Data Management Platform. https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform. 2018. 6 Nabeel Sarwar. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions. https://databricks.com/session/operationalizing-machine-learning-managing-provenance-from-raw-data-to- predictions. 2018. 7 Kim Hazelwood et al. “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”. In: Feb. 2018, pp. 620–629. DOI: 10.1109/HPCA.2018.00059. 8 Willem Pienaar. Building a Feature Platform to Scale Machine Learning | DataEngConf BCN ’18. https://www.youtube.com/watch?v=0iCXY6VnpCc. 2018. 9Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 15 / 22
  • 35. The Components of a Feature Store Feature Storage Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 16 / 22
  • 36. The Components of a Feature Store Feature Metadata Schema Statistics Documentation Jobs Feature Storage Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 16 / 22
  • 37. The Components of a Feature Store Client Interface API from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) Feature Registry Feature Metadata Schema Statistics Documentation Jobs Feature Storage Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 16 / 22
  • 38. Hopsworks Feature Store Feature Storage HopsFS Feature groups Training Datasets Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
  • 39. Hopsworks Feature Store Feature Metadata Schema Statistics Documentation Jobs Feature Storage HopsFS Feature groups Training Datasets Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
  • 40. Hopsworks Feature Store REST API Feature Metadata Schema Statistics Documentation Jobs Feature Storage HopsFS Feature groups Training Datasets Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
  • 41. Hopsworks Feature Store Client Interface API from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) Hopsworks Feature Registry REST API Feature Metadata Schema Statistics Documentation Jobs Feature Storage HopsFS Feature groups Training Datasets Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
  • 42. Hopsworks Feature Store Client Interface API from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) Hopsworks Feature Registry REST API Feature Metadata Schema Statistics Documentation Jobs Feature Storage HopsFS Feature groups Training Datasets e1 e2 e3 e4 Gradient Gradient Gradient Gradient Model Training/Serving Feature Engineering Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 17 / 22
  • 43. Hopsworks Feature Store Data Sources HopsFS Feature Creation Feature Storage FeatureStoreAPI Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 18 / 22
  • 44. Feature Creation Example [in:5, out:10.5, . . .] Transactions Spark sum_trx_amount from hops import featurestore trx_df = spark.read.parquet(..) trx_sum_amount_df = trx_df.select("amount,customer") .groupBy("customer") .agg(sum("amount")) featurestore.create_featuregroup( trx_sum_amount_df , "trx_sum_amount", description="sum of transactions" ) Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 19 / 22
  • 45. Feature Computation Raw/Structured Data Data Lake Feature Store Curated Features Model b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Demo-Setting Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 20 / 22
  • 46. Summary Machine learning comes with a high technical cost Machine learning pipelines needs proper data management A feature store is a place to store curated and documented features The feature store serves as an interface between feature engineering and model development, it can help disentangle complex ML pipelines Hopsworks12 provides the world’s first open-source feature store @hopshadoop www.hops.io @logicalclocks www.logicalclocks.com We are open source: https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops 13 12 Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018. 13 Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan, and Rasmus Toivonen Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 21 / 22
  • 47. References Hopsworks’ feature store14 HopsML15 Hopsworks16 14 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/. 2018. 15 Logical Clocks AB. HopsML: Python-First ML Pipelines. https://hops.readthedocs.io/en/latest/hopsml/hopsML.html. 2018. 16 Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store April 23, 2019 22 / 22