SlideShare a Scribd company logo
1 of 14
Download to read offline
Feature Store: the missing data layer in ML pipelines?1
FOSDEM 2019 Brussels | HPC, Big Data and Data Science DevRoom
Kim Hammar
kim@logicalclocks.com
@KimHammar1
February 1, 2019
1
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines?
https://www.logicalclocks.com/feature-store/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 1 / 13
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 2 / 13
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
?
_
_("))_/
_
2
Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.
https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 3 / 13
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
?
_
_("))_/
_
“Data is the hardest part of ML and the most important piece to
get right.
Modelers spend most of their time selecting and transforming
features at training time and then building the pipelines to deliver
those features to production models.”
- Uber2
2
Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.
https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 3 / 13
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆyFeature Store
“Data is the hardest part of ML and the most important piece to
get right.
Modelers spend most of their time selecting and transforming
features at training time and then building the pipelines to deliver
those features to production models.”
- Uber3
3
Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.
https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 4 / 13
Solution: Disentangle ML Pipelines
with a Feature Store
Raw/Structured Data
Feature Store
Feature Engineering Training
Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
A feature store is a central vault for storing documented, curated, and
access-controlled features.
The feature store is the interface between data engineering and data
model development
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 5 / 13
Motivation for the Feature Store
The feature store enables:
Reusability of features between models and teams
Automatic backfilling of features
Automatic feature documentation and analysis
Feature versioning
Standardized access of features between training and serving
Feature discovery
Num. Curated Features in Feature Store
Avg.CostofanewMLProject
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 6 / 13
Reusing Features
Without a Feature Store is Complex
Data Sources Dataset 1 Dataset 2 . . . Dataset n




w1,1 . . . w1,n
...
...
...
wn,1 . . . wn,n




Features




w1,1 . . . w1,n
...
...
...
wn,1 . . . wn,n




Features




w1,1 . . . w1,n
...
...
...
wn,1 . . . wn,n




Features
Siloed Feature Sets
Without a feature store it is typical to
have feature sets stored in
isolation from each other.
Models
Models are trained using sets of features.
Without a feature store each model typically
defines its own feature definitions,
without feature sharing across models.
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 7 / 13
Reusing Features
With a Feature Store is Simple
Data Sources Dataset 1 Dataset 2 . . . Dataset n
Feature Store
Feature Store
A data management platform for machine learning.
The interface between data engineering and data science.
Models
Models are trained using sets of features.
The features are fetched from the feature store
and can overlap between models.
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 8 / 13
The Components of a Feature Store
The Storage Layer: For storing feature data in the feature store
The Metadata Layer: For storing feature metadata (versioning,
feature analysis, documentation, jobs)
The Feature Engineering Jobs: For computing features
The Feature Registry: A user interface to share and discover
features
The Feature Store API: For writing/reading to/from the feature
store
Feature Storage
Feature Metadata Jobs
Feature Registry API
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 9 / 13
Hopsworks Feature Store API
Reading from the Feature Store:
from hops import featurestore
features_df = featurestore.get_features([
"average_attendance",
"average_player_age"
])
Writing to the Feature Store:
from hops import featurestore
raw_data = spark.read.parquet(filename)
pol_features = raw_data.map(lambda x: x^2)
featurestore.insert_into_featuregroup(pol_features , "pol_featuregroup")
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 10 / 13
Hopsworks Feature Store API Service
SQL
hops-util
Query Planner
Feature Store Data
Hive on HopsFS
Feature Store
Metadata
Dataframe With Features
Client Interface Feature Store Service Output
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 11 / 13
Summary
Machine learning comes with a high technical cost
Machine learning pipelines needs proper data management
A feature store is a place to store curated and documented features
The feature store serves as an interface between feature engineering
and model development, it can help disentangle complex ML pipelines
Hopsworks4 provides the world’s first open-source feature store
@hopshadoop
www.hops.io
@logicalclocks
www.logicalclocks.com
We are open source:
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops
5
4
Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018.
5
Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou,
Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, and Alex Ormenisan
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 12 / 13
References
Hopsworks’ feature store6 (the only open-source one!)
Uber’s feature store7
Airbnb’s feature store8
Comcast’s feature store9
GO-JEK’s feature store10
HopsML11
Hopsworks12
6
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines?
https://www.logicalclocks.com/feature-store/. 2018.
7
Li Erran Li et al. “Scaling Machine Learning as a Service”. In: Proceedings of The 3rd International
Conference on Predictive Applications and APIs. Ed. by Claire Hardgrove et al. Vol. 67. Proceedings of Machine
Learning Research. Microsoft NERD, Boston, USA: PMLR, 2017, pp. 14–29. URL:
http://proceedings.mlr.press/v67/li17a.html.
8
Nikhil Simha and Varant Zanoyan. Zipline: Airbnb’s Machine Learning Data Management Platform.
https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform. 2018.
9
Nabeel Sarwar. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions.
https://databricks.com/session/operationalizing-machine-learning-managing-provenance-from-raw-data-to-
predictions. 2018.
10
Willem Pienaar. Building a Feature Platform to Scale Machine Learning | DataEngConf BCN ’18.
https://www.youtube.com/watch?v=0iCXY6VnpCc. 2018.
11
Logical Clocks AB. HopsML: Python-First ML Pipelines.
https://hops.readthedocs.io/en/latest/hopsml/hopsML.html. 2018.
12
Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 13 / 13

More Related Content

Similar to Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store

MLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseBigML, Inc
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesJim Dowling
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paperstrange_loop
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
 
Demonstrating Warehousing Concepts Through Interactive Animations
Demonstrating Warehousing Concepts Through Interactive AnimationsDemonstrating Warehousing Concepts Through Interactive Animations
Demonstrating Warehousing Concepts Through Interactive Animationsertekg
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Eiratech Robotics Overview Presentation, 2019
Eiratech Robotics Overview Presentation, 2019Eiratech Robotics Overview Presentation, 2019
Eiratech Robotics Overview Presentation, 2019Alexander Belenky
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Northwestern 20181004 v9
Northwestern 20181004 v9Northwestern 20181004 v9
Northwestern 20181004 v9ISSIP
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Towards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailTowards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailPaulo Zanini
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
3rd Hivemall meetup
3rd Hivemall meetup3rd Hivemall meetup
3rd Hivemall meetupMakoto Yui
 
Intel 20180608 v2
Intel 20180608 v2Intel 20180608 v2
Intel 20180608 v2ISSIP
 
Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1ISSIP
 
Eiratech Robotics Overview, July 2018
Eiratech Robotics Overview, July 2018Eiratech Robotics Overview, July 2018
Eiratech Robotics Overview, July 2018Alexander Belenky
 

Similar to Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store (20)

MLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paper
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 
Demonstrating Warehousing Concepts Through Interactive Animations
Demonstrating Warehousing Concepts Through Interactive AnimationsDemonstrating Warehousing Concepts Through Interactive Animations
Demonstrating Warehousing Concepts Through Interactive Animations
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Eiratech Robotics Overview Presentation, 2019
Eiratech Robotics Overview Presentation, 2019Eiratech Robotics Overview Presentation, 2019
Eiratech Robotics Overview Presentation, 2019
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Northwestern 20181004 v9
Northwestern 20181004 v9Northwestern 20181004 v9
Northwestern 20181004 v9
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Towards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailTowards batch one size with industrial semantics email
Towards batch one size with industrial semantics email
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
3rd Hivemall meetup
3rd Hivemall meetup3rd Hivemall meetup
3rd Hivemall meetup
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Intel 20180608 v2
Intel 20180608 v2Intel 20180608 v2
Intel 20180608 v2
 
Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1
 
Eiratech Robotics Overview, July 2018
Eiratech Robotics Overview, July 2018Eiratech Robotics Overview, July 2018
Eiratech Robotics Overview, July 2018
 

More from Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion ResponseKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionKim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security AutomationKim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...Kim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseKim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 

More from Kim Hammar (20)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 

Recently uploaded

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Recently uploaded (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store

  • 1. Feature Store: the missing data layer in ML pipelines?1 FOSDEM 2019 Brussels | HPC, Big Data and Data Science DevRoom Kim Hammar kim@logicalclocks.com @KimHammar1 February 1, 2019 1 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 1 / 13
  • 2. ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 2 / 13
  • 3. ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy ? _ _("))_/ _ 2 Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo. https://eng.uber.com/scaling-michelangelo/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 3 / 13
  • 4. ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy ? _ _("))_/ _ “Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models.” - Uber2 2 Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo. https://eng.uber.com/scaling-michelangelo/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 3 / 13
  • 5. ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆyFeature Store “Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models.” - Uber3 3 Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo. https://eng.uber.com/scaling-michelangelo/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 4 / 13
  • 6. Solution: Disentangle ML Pipelines with a Feature Store Raw/Structured Data Feature Store Feature Engineering Training Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy A feature store is a central vault for storing documented, curated, and access-controlled features. The feature store is the interface between data engineering and data model development Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 5 / 13
  • 7. Motivation for the Feature Store The feature store enables: Reusability of features between models and teams Automatic backfilling of features Automatic feature documentation and analysis Feature versioning Standardized access of features between training and serving Feature discovery Num. Curated Features in Feature Store Avg.CostofanewMLProject Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 6 / 13
  • 8. Reusing Features Without a Feature Store is Complex Data Sources Dataset 1 Dataset 2 . . . Dataset n     w1,1 . . . w1,n ... ... ... wn,1 . . . wn,n     Features     w1,1 . . . w1,n ... ... ... wn,1 . . . wn,n     Features     w1,1 . . . w1,n ... ... ... wn,1 . . . wn,n     Features Siloed Feature Sets Without a feature store it is typical to have feature sets stored in isolation from each other. Models Models are trained using sets of features. Without a feature store each model typically defines its own feature definitions, without feature sharing across models. b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 7 / 13
  • 9. Reusing Features With a Feature Store is Simple Data Sources Dataset 1 Dataset 2 . . . Dataset n Feature Store Feature Store A data management platform for machine learning. The interface between data engineering and data science. Models Models are trained using sets of features. The features are fetched from the feature store and can overlap between models. b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 8 / 13
  • 10. The Components of a Feature Store The Storage Layer: For storing feature data in the feature store The Metadata Layer: For storing feature metadata (versioning, feature analysis, documentation, jobs) The Feature Engineering Jobs: For computing features The Feature Registry: A user interface to share and discover features The Feature Store API: For writing/reading to/from the feature store Feature Storage Feature Metadata Jobs Feature Registry API Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 9 / 13
  • 11. Hopsworks Feature Store API Reading from the Feature Store: from hops import featurestore features_df = featurestore.get_features([ "average_attendance", "average_player_age" ]) Writing to the Feature Store: from hops import featurestore raw_data = spark.read.parquet(filename) pol_features = raw_data.map(lambda x: x^2) featurestore.insert_into_featuregroup(pol_features , "pol_featuregroup") Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 10 / 13
  • 12. Hopsworks Feature Store API Service SQL hops-util Query Planner Feature Store Data Hive on HopsFS Feature Store Metadata Dataframe With Features Client Interface Feature Store Service Output Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 11 / 13
  • 13. Summary Machine learning comes with a high technical cost Machine learning pipelines needs proper data management A feature store is a place to store curated and documented features The feature store serves as an interface between feature engineering and model development, it can help disentangle complex ML pipelines Hopsworks4 provides the world’s first open-source feature store @hopshadoop www.hops.io @logicalclocks www.logicalclocks.com We are open source: https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops 5 4 Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018. 5 Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, and Alex Ormenisan Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 12 / 13
  • 14. References Hopsworks’ feature store6 (the only open-source one!) Uber’s feature store7 Airbnb’s feature store8 Comcast’s feature store9 GO-JEK’s feature store10 HopsML11 Hopsworks12 6 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/. 2018. 7 Li Erran Li et al. “Scaling Machine Learning as a Service”. In: Proceedings of The 3rd International Conference on Predictive Applications and APIs. Ed. by Claire Hardgrove et al. Vol. 67. Proceedings of Machine Learning Research. Microsoft NERD, Boston, USA: PMLR, 2017, pp. 14–29. URL: http://proceedings.mlr.press/v67/li17a.html. 8 Nikhil Simha and Varant Zanoyan. Zipline: Airbnb’s Machine Learning Data Management Platform. https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform. 2018. 9 Nabeel Sarwar. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions. https://databricks.com/session/operationalizing-machine-learning-managing-provenance-from-raw-data-to- predictions. 2018. 10 Willem Pienaar. Building a Feature Platform to Scale Machine Learning | DataEngConf BCN ’18. https://www.youtube.com/watch?v=0iCXY6VnpCc. 2018. 11 Logical Clocks AB. HopsML: Python-First ML Pipelines. https://hops.readthedocs.io/en/latest/hopsml/hopsML.html. 2018. 12 Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store February 1, 2019 13 / 13