SlideShare a Scribd company logo
Dr. Jim Dowling
CEO / Co-Founder
Logical Clocks
Managed Feature Store
for ML Webinar
[ Presenter ]
Leadership & Offices
Stockholm
Box 1263,
Isafjordsgatan 22
Kista,
Sweden
London
IDEALondon,
69 Wilson St,
London,,
UK
Silicon Valley
470 Ramona St
Palo Alto
California,
USA
Dr. Jim Dowling
CEO
Theo Kakantousis
COO
Prof. Seif Haridi
Chief Scientist
Fabio Buso
VP Engineering
Steffen Grohsschmiedt
Head Of Cloud
www.logicalclocks.com
Shraddha Chouhan
Head Of Marketing
Hopsworks - Award Winning Platform
Today’s Journey to a Feature Store and Beyond
Ad-hoc Scripts
and Jobs
Shared Feature
Pipelines
Feature Store
MLOps with a
Feature Store
Known Feature Stores in Production
● Logical Clocks – Hopsworks (world’s first open source)
● Uber Michelangelo
● Airbnb – Bighead/Zipline
● Comcast
● Twitter
● GO-JEK Feast (GCE, open-source layer over BigTable/BigQuery)
● Branch
● Conde Nast
● Facebook FB Learner
● Netflix
Reference: www.featurestore.org
What is a Feature?
A feature is a measurable property of a phenomena under observation and
(part of) an input to a ML model.
Example features:
● A raw word, a pixel, a sound
wave, a sensor value;
● An aggregate
(mean, max, sum, min)
● A window
(last_hour, last_day, etc)
● A derived representation
(embedding or cluster)
numbers
(in arrays)
A Data Engineer’s perspective on Feature Engineering
numbers
arrays
(of numbers)
one-hot
encoding
Databases
Schemas
varchar, charsets
integer, blob,
varbinary
Feature Engineering is about Transforming Data
Feature Engineering is about Transforming Data
from pyspark.ml.feature import Normalizer
scaledDF = spark.parquet.read(”…”)
l1_norm=Normalizer().setP(1).setInputCol("features").setOutputCol("l1_norm")
l1_norm.transform(scaleDF)
Normalize
Consistent Features between Training and Inference
It’s not always trivial to ensure features are engineered
consistently between training and inference
Features
Training
Labels Model
Features
Inference
Model Labels
Feature Store – Reuse Cached Features
One
Feature
Pipeline
Get
Get
Features
Training
Labels Model
Features
Inference
Model Labels
Feature
Store
Features name Pclass Sex Survive Name Balance
Train / Test
Datasets
Survivename PClass Sex Balance
Join key
Feature
Groups
Titanic ​
Passenger List​
Passenger
Bank Account
File format
.tfrecords
.npy
.csv
.hdf5,
.petastorm, etc
Storage
GCS
Amazon S3
HopsFS
Features, FeatureGroups, and Train/Test Datasets are all versioned
Feature Store Concepts
Streaming App pushes click features every 5 secs
Streaming App pushes CDC data every 30 secs
Pandas App pushes user profile updates every hour
Batch App pushes featurized weblogs data every day
Online
Feature
Store
Offline
Feature
Store
SQL DW
S3, HDFS
SQL
Event Data
Real-Time Data
Real-time feature transformations (<2 secs) Online
App
Low
Latency
Features
High
Latency
Features
Train,
Batch App
FeatureGroups are ingested at different Cadences
Feature Store
No existing database is both scalable (PBs) and low latency (<10ms). Hence, online + offline Feature Stores.
Feature Store
ClickFeatureGroup
TableFeatureGroup
UserFeatureGroup
LogsFeatureGroup
Event Data
SQL DW
S3, HDFS
SQL
DataFrameAPI
Kafka Input
Flink
RTFeatureGroup
Online
App
Train,
Batch App
FeatureGroup ingestion in Hopsworks
User Clicks
DB Updates
User Profile Updates
Weblogs
Real-time features
Kafka Output
Simplify access to the online/offline Feature Stores by providing a general-purpose DataFrame API.
Register a Feature Group with the Feature Store
from hops import featurestore as fs
df = # Spark or Pandas Dataframe
# Do feature engineering on ‘df’
# Register Dataframe as FeatureGroup
fs.create_featuregroup(df, ”titanic_df“)
HOPSWORKS
Rest API
1 Add Metadata
2 Add Statistics
….
Offline FS
Apache Hive
HopsFS
(External)
Spark Cluster
.parquet, .orc (TLS)
Online FS
MySQL Cluster
fs.create_featuregroup(df, “titanic_df”,
offline=True, online=True)
Feature Ingestion with Spark
Online
Feature Store
(Serving)
Offline
Feature Store
(Training & Batch)
Online Apps
Model Training
Batch Apps
Event Data
SQL DW
S3, HDFS
SQL
Ingest
Data
From
Used
By
Hopsworks Feature Store
Create Training Datasets using the Feature Store
from hops import featurestore as fs
sample_data = fs.get_features([“name”, “Pclass”, “Sex”, “Balance”, “Survived”])
fs.create_training_dataset(sample_data, “titanic_training_dataset",
data_format="tfrecords“, training_dataset_version=1)
HOPSWORKS
Offline FS
Apache Hive
HopsFS
Join Features <<TLS>>
Online FS
MySQL Cluster
(External)
Spark Cluster
sample_data = fs.get_features([“name”,
“Pclass”, “Sex”, “Balance”, “Survived”])
Create Training Datasets with (External) Spark
Storage
GCS Amazon S3 HopsFS
.npy, .tfrecords, .csv
commit-0097
….
commit-0002
commit-0001
FeatureGroup
atomic
update
Feature Store
Time-Travel Queries for Creating Training Datasets
df = fs.get_features(…., from=“2017”, to=“2019”)
Storage
GCS Amazon S3 HopsFS
.tfrecords
.csv
.npy
US-West-la
MySQL
NDB1 Model
Online Application
1.JDBC 2.Predict
1. Build a Feature Vector using the Online Feature Store
US-West-1c
MySQL
NDB3Model
~5-50ms
Online Feature Store: High Availability & Low-Latency
US-West-1b
MySQL
NDB2Model
2-20ms
2. Send the Feature Vector to a Model for Prediction
HOPSWORKS
Rest API
Return JDBC Query
….
Offline FS
Apache Hive
HopsFS
Online FS
MySQL Cluster SELECT .. FROM WHERE … in [keys]
<<TLS>>
getQuery(“model”)
<<API-Key>> Online
Application
Online Feature Store: JDBC API
[keys]
user_id,
session_id,
timestamp, etc
Model
Prediction
HOPSWORKS
APPLICATIONS
API
DASHBOARDS
HOPSWORKS
DATASOURCES
ORCHESTRATION
In Airflow
BATCH
Apache Beam
Apache Spark
STREAMING
Apache Beam
Apache Spark
Apache Flink
HOPSWORKS
FEATURE
STORE
DISTRIBUTED
ML & DL
Pip
Conda
Tensorflow
scikit-learn
PyTorch
Jupyter
Notebooks
Tensorboard
FILESYSTEM & METADATA STORAGE
HopsFS
MODEL
SERVING
Kubernetes
MODEL
MONITORING
Kafka
+
Spark Streaming
Data Preparation
& Ingestion
Experimentation
& Model Training
Deploy
& Productionalize
Apache
Kafka
1
Feature
Engineering
2
Feature
Selection
3
Training &
Validation
4 Serving 5 Prediction
Train/Test Data
(S3, HDFS, etc)
Online
Application
Batch
Application
Data Warehouse
Data Lake
Feature
Engineering
Offline
Feature Store
Feature
Selection
Scoring &
Validation
Train
Model
Serving
Online
Feature Store
Model
Repository
Monitor
Experiments
Deploy
Feature Vector
Kafka
ML Lifecycle
Stage 1. Data Engineer
Models
Stage 2. Data Scientist
Model APIs
Stage 3. ML Engineer
Intelligent App
Stage 4. App Developer
Features
Model Hyperparameters
Model Candidates
Feature
Selection
Training DataTest Data
Model
Design
Model
Architecture
Model
Architecture
Model
Architecture
Model
Architecture
Model Repository
Model
Architecture
Model
Architecture
Model
ArchitectureTrial
Data Scientist
Experiments
Model Validation
Batch Apps
Online
Application
Predict
Get Online Features
App DeveloperRedshift S3 Cassandra Hadoop
Feature
Engineering
Feature Store
Data Engineer
Kubernetes / Serverless
KPI Dashboards
Alerts
Actions
Model
Architecture
Model
Architecture
Model
Architecture
Model
ArchitectureModel
Kafka
Model Inference API
Log Predictions
Predict
Streaming or
Serverless
Monitoring App
Log Predictions and
Join Outcomes
Online Model Serving
ML Engineer
Feature Store
Offline Features (Hive)
Secure Multi-Tenancy
Role-based Access Control
Encryption At-Rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, experiments, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies
Real-Time, HA Database
MySQL Cluster (NDB)
JDBC API for Serving Clients
Online apps only need JDBC
In-Memory or NVMe data
Single-digit ms query times
Apache Hive on HopsFS
Scalable Data warehouse
Spark for Feature Computing
Fast backfilling of Training Data
HopsFS
NVMe speed with Big Data
HA and Horizontally Scalable
From 1 to 100s of nodes and
PBs of data
Hive
HA and Horizontally Scalable
Add nodes with no downtime
and scale to 10s of TBs
JDBC
NDB
NVMe
Security & GovernanceOnline Features (NDB)
Agenda for demo
Feature Store Overview
Access control / governance / statistics
Creating Features
Online vs Offline Features
Search for Features
Create training dataset
Query planner and hints
Online Feature Store
JDBC API for online the Feature Store
Hopsworks Subscription Models
Full Featured
AGPL-v3 License Model
Hopsworks Community
Kubernetes Support
• Model Serving
• Other services for robustness (Jupyter, more coming)
Authentication (LDAP, Kerberos, OAuth2)
Github support
Hopsworks Enterprise
Try it out!
www.hopsworks.ai
Stockholm
Box 1263,
Isafjordsgatan 22
Kista,
Sweden
London
IDEALondon,
69 Wilson St,
London,,
UK
Silicon Valley
470 Ramona St
Palo Alto
California,
USA
www.logicalclocks.com

More Related Content

What's hot

MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
Databricks
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Eric Kavanagh
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
Marco Parenzan
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
Databricks
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Data mesh
Data meshData mesh
Data mesh
ManojKumarR41
 
The Architecture of an API Platform
The Architecture of an API PlatformThe Architecture of an API Platform
The Architecture of an API Platform
Johannes Ridderstedt
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 

What's hot (20)

MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Data mesh
Data meshData mesh
Data mesh
 
The Architecture of an API Platform
The Architecture of an API PlatformThe Architecture of an API Platform
The Architecture of an API Platform
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 

Similar to Managed Feature Store for Machine Learning

MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
Moritz Meister
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
Jim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
Jim Dowling
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Lex Avstreikh
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
Databricks
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
Marc Gille
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
Jim Dowling
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
Paulo Gutierrez
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Getting value from IoT, Integration and Data Analytics
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignalITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp
 
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Kim Hammar
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
Jim Dowling
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
 
Managing the Continuous Delivery of Code to AWS Lambda
Managing the Continuous Delivery of Code to AWS LambdaManaging the Continuous Delivery of Code to AWS Lambda
Managing the Continuous Delivery of Code to AWS Lambda
Amazon Web Services
 

Similar to Managed Feature Store for Machine Learning (20)

MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
 
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignalITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
 
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
Managing the Continuous Delivery of Code to AWS Lambda
Managing the Continuous Delivery of Code to AWS LambdaManaging the Continuous Delivery of Code to AWS Lambda
Managing the Continuous Delivery of Code to AWS Lambda
 

Recently uploaded

Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
VALiNTRY360
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 

Recently uploaded (20)

Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 

Managed Feature Store for Machine Learning

  • 1. Dr. Jim Dowling CEO / Co-Founder Logical Clocks Managed Feature Store for ML Webinar [ Presenter ]
  • 2. Leadership & Offices Stockholm Box 1263, Isafjordsgatan 22 Kista, Sweden London IDEALondon, 69 Wilson St, London,, UK Silicon Valley 470 Ramona St Palo Alto California, USA Dr. Jim Dowling CEO Theo Kakantousis COO Prof. Seif Haridi Chief Scientist Fabio Buso VP Engineering Steffen Grohsschmiedt Head Of Cloud www.logicalclocks.com Shraddha Chouhan Head Of Marketing
  • 3. Hopsworks - Award Winning Platform
  • 4. Today’s Journey to a Feature Store and Beyond Ad-hoc Scripts and Jobs Shared Feature Pipelines Feature Store MLOps with a Feature Store
  • 5. Known Feature Stores in Production ● Logical Clocks – Hopsworks (world’s first open source) ● Uber Michelangelo ● Airbnb – Bighead/Zipline ● Comcast ● Twitter ● GO-JEK Feast (GCE, open-source layer over BigTable/BigQuery) ● Branch ● Conde Nast ● Facebook FB Learner ● Netflix Reference: www.featurestore.org
  • 6. What is a Feature? A feature is a measurable property of a phenomena under observation and (part of) an input to a ML model. Example features: ● A raw word, a pixel, a sound wave, a sensor value; ● An aggregate (mean, max, sum, min) ● A window (last_hour, last_day, etc) ● A derived representation (embedding or cluster)
  • 7. numbers (in arrays) A Data Engineer’s perspective on Feature Engineering numbers arrays (of numbers) one-hot encoding Databases Schemas varchar, charsets integer, blob, varbinary
  • 8. Feature Engineering is about Transforming Data
  • 9. Feature Engineering is about Transforming Data from pyspark.ml.feature import Normalizer scaledDF = spark.parquet.read(”…”) l1_norm=Normalizer().setP(1).setInputCol("features").setOutputCol("l1_norm") l1_norm.transform(scaleDF) Normalize
  • 10. Consistent Features between Training and Inference It’s not always trivial to ensure features are engineered consistently between training and inference Features Training Labels Model Features Inference Model Labels
  • 11. Feature Store – Reuse Cached Features One Feature Pipeline Get Get Features Training Labels Model Features Inference Model Labels Feature Store
  • 12. Features name Pclass Sex Survive Name Balance Train / Test Datasets Survivename PClass Sex Balance Join key Feature Groups Titanic ​ Passenger List​ Passenger Bank Account File format .tfrecords .npy .csv .hdf5, .petastorm, etc Storage GCS Amazon S3 HopsFS Features, FeatureGroups, and Train/Test Datasets are all versioned Feature Store Concepts
  • 13. Streaming App pushes click features every 5 secs Streaming App pushes CDC data every 30 secs Pandas App pushes user profile updates every hour Batch App pushes featurized weblogs data every day Online Feature Store Offline Feature Store SQL DW S3, HDFS SQL Event Data Real-Time Data Real-time feature transformations (<2 secs) Online App Low Latency Features High Latency Features Train, Batch App FeatureGroups are ingested at different Cadences Feature Store No existing database is both scalable (PBs) and low latency (<10ms). Hence, online + offline Feature Stores.
  • 14. Feature Store ClickFeatureGroup TableFeatureGroup UserFeatureGroup LogsFeatureGroup Event Data SQL DW S3, HDFS SQL DataFrameAPI Kafka Input Flink RTFeatureGroup Online App Train, Batch App FeatureGroup ingestion in Hopsworks User Clicks DB Updates User Profile Updates Weblogs Real-time features Kafka Output Simplify access to the online/offline Feature Stores by providing a general-purpose DataFrame API.
  • 15. Register a Feature Group with the Feature Store from hops import featurestore as fs df = # Spark or Pandas Dataframe # Do feature engineering on ‘df’ # Register Dataframe as FeatureGroup fs.create_featuregroup(df, ”titanic_df“)
  • 16. HOPSWORKS Rest API 1 Add Metadata 2 Add Statistics …. Offline FS Apache Hive HopsFS (External) Spark Cluster .parquet, .orc (TLS) Online FS MySQL Cluster fs.create_featuregroup(df, “titanic_df”, offline=True, online=True) Feature Ingestion with Spark
  • 17. Online Feature Store (Serving) Offline Feature Store (Training & Batch) Online Apps Model Training Batch Apps Event Data SQL DW S3, HDFS SQL Ingest Data From Used By Hopsworks Feature Store
  • 18. Create Training Datasets using the Feature Store from hops import featurestore as fs sample_data = fs.get_features([“name”, “Pclass”, “Sex”, “Balance”, “Survived”]) fs.create_training_dataset(sample_data, “titanic_training_dataset", data_format="tfrecords“, training_dataset_version=1)
  • 19. HOPSWORKS Offline FS Apache Hive HopsFS Join Features <<TLS>> Online FS MySQL Cluster (External) Spark Cluster sample_data = fs.get_features([“name”, “Pclass”, “Sex”, “Balance”, “Survived”]) Create Training Datasets with (External) Spark Storage GCS Amazon S3 HopsFS .npy, .tfrecords, .csv
  • 20. commit-0097 …. commit-0002 commit-0001 FeatureGroup atomic update Feature Store Time-Travel Queries for Creating Training Datasets df = fs.get_features(…., from=“2017”, to=“2019”) Storage GCS Amazon S3 HopsFS .tfrecords .csv .npy
  • 21. US-West-la MySQL NDB1 Model Online Application 1.JDBC 2.Predict 1. Build a Feature Vector using the Online Feature Store US-West-1c MySQL NDB3Model ~5-50ms Online Feature Store: High Availability & Low-Latency US-West-1b MySQL NDB2Model 2-20ms 2. Send the Feature Vector to a Model for Prediction
  • 22. HOPSWORKS Rest API Return JDBC Query …. Offline FS Apache Hive HopsFS Online FS MySQL Cluster SELECT .. FROM WHERE … in [keys] <<TLS>> getQuery(“model”) <<API-Key>> Online Application Online Feature Store: JDBC API [keys] user_id, session_id, timestamp, etc Model Prediction
  • 24. APPLICATIONS API DASHBOARDS HOPSWORKS DATASOURCES ORCHESTRATION In Airflow BATCH Apache Beam Apache Spark STREAMING Apache Beam Apache Spark Apache Flink HOPSWORKS FEATURE STORE DISTRIBUTED ML & DL Pip Conda Tensorflow scikit-learn PyTorch Jupyter Notebooks Tensorboard FILESYSTEM & METADATA STORAGE HopsFS MODEL SERVING Kubernetes MODEL MONITORING Kafka + Spark Streaming Data Preparation & Ingestion Experimentation & Model Training Deploy & Productionalize Apache Kafka
  • 25. 1 Feature Engineering 2 Feature Selection 3 Training & Validation 4 Serving 5 Prediction Train/Test Data (S3, HDFS, etc) Online Application Batch Application Data Warehouse Data Lake Feature Engineering Offline Feature Store Feature Selection Scoring & Validation Train Model Serving Online Feature Store Model Repository Monitor Experiments Deploy Feature Vector Kafka
  • 26. ML Lifecycle Stage 1. Data Engineer Models Stage 2. Data Scientist Model APIs Stage 3. ML Engineer Intelligent App Stage 4. App Developer Features Model Hyperparameters Model Candidates Feature Selection Training DataTest Data Model Design Model Architecture Model Architecture Model Architecture Model Architecture Model Repository Model Architecture Model Architecture Model ArchitectureTrial Data Scientist Experiments Model Validation Batch Apps Online Application Predict Get Online Features App DeveloperRedshift S3 Cassandra Hadoop Feature Engineering Feature Store Data Engineer Kubernetes / Serverless KPI Dashboards Alerts Actions Model Architecture Model Architecture Model Architecture Model ArchitectureModel Kafka Model Inference API Log Predictions Predict Streaming or Serverless Monitoring App Log Predictions and Join Outcomes Online Model Serving ML Engineer
  • 27. Feature Store Offline Features (Hive) Secure Multi-Tenancy Role-based Access Control Encryption At-Rest, In-Motion TLS/SSL everywhere AI-Asset Governance Models, experiments, data, GPUs Data/Model/Feature Lineage Discover/track dependencies Real-Time, HA Database MySQL Cluster (NDB) JDBC API for Serving Clients Online apps only need JDBC In-Memory or NVMe data Single-digit ms query times Apache Hive on HopsFS Scalable Data warehouse Spark for Feature Computing Fast backfilling of Training Data HopsFS NVMe speed with Big Data HA and Horizontally Scalable From 1 to 100s of nodes and PBs of data Hive HA and Horizontally Scalable Add nodes with no downtime and scale to 10s of TBs JDBC NDB NVMe Security & GovernanceOnline Features (NDB)
  • 28. Agenda for demo Feature Store Overview Access control / governance / statistics Creating Features Online vs Offline Features Search for Features Create training dataset Query planner and hints Online Feature Store JDBC API for online the Feature Store
  • 29. Hopsworks Subscription Models Full Featured AGPL-v3 License Model Hopsworks Community Kubernetes Support • Model Serving • Other services for robustness (Jupyter, more coming) Authentication (LDAP, Kerberos, OAuth2) Github support Hopsworks Enterprise
  • 30. Try it out! www.hopsworks.ai Stockholm Box 1263, Isafjordsgatan 22 Kista, Sweden London IDEALondon, 69 Wilson St, London,, UK Silicon Valley 470 Ramona St Palo Alto California, USA www.logicalclocks.com