SlideShare a Scribd company logo
Jim Dowling
CEO, Logical Clocks
Cloud Native London Meetup, March 3 2021
Building Hopsworks, a cloud-native managed
feature store for machine learning
Can we make a Monolith fly in the clouds?
The Hopsworks Feature Store - Available on all Platforms as Managed, Enterprise, and Community
hopsworks.ai
(managed platform)
Enterprise Hopsworks
(self-hosted platform)
Community Hopsworks**
(self-hosted platform)
Runs on any Platform*
(On-premise, Cloud, VMs, etc)
Runs on any Platform*
(On-premise, Cloud, VMs, etc)
*Supported operating systems: RHEL/Centos 7.x and Ubuntu 18.04. Minimum Requirements: 32GB RAM, 100GB disk, 8 CPUs. Runs in air-gapped environments.
**Community Hopsworks does not include (1) Feature Store Connectors to Third-Party Platforms and (2) SSO with Active Directory/OAuth-2/Azure-AD/AWS.
2016 2018 2020
Only Managed Feature Available
today on both AWS and Azure
When do I need a Feature
Store for Machine Learning
and what it is anyway?
Business Problem: Use Machine Learning to Predict Money Laundering
Reference: Whitepaper, Webinar
6
What data can I use to solve my Anti-Money Laundering Problem with?
Know Your Customer Data
Historical Financial Transactions
Recent
Financial
Transactions
Data Warehouse
Data Lake
Message Bus
TRAIN
SERVE
7
It is not always easy to get access to Enterprise data for training and serving.
Know Your Customer Data
Historical Financial Transactions
Recent
Financial
Transactions
Data Warehouse
Data Lake
Message Bus
TRAIN
SERVE
8
What data can I use to make predictions with?
Know Your Customer Data
Historical Financial Transactions
Recent
Financial
Transactions
Data Warehouse
Data Lake
Message Bus
TRAIN
SERVE
Feature
Store
Where does the Feature Store fit into the ML Pipeline?
FEATURE STORE
TRAIN / SERVE
FEATURIZE
Offline Feature Store - Create Training Data and Batch Predictions
df = kycFG.select_all().join(rftFG.select_all()).join(hftFG.select_all())
td = fs.create_training_dataset("precipitation_training_dataset",
version=1,
data_format="tfrecord",
description="Precipitation Training dataset",
splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})
td.save(df)
FG=Feature Group https://docs.hopsworks.ai/
Feature Store
kycFG
rftFG
hftFG
Training Data
(.tfrecord)
Model
train
Online Feature Store - the Data Layer for Operational (Online) Models
US-West-1c
US-West-la
US-West-1b
RonDB2
Model
RonDB1 Model
RonDB3
Model
Online Application
1.JDBC 2.Predict
2-20ms
1. Build Feature Vector Using Online Feature Store
2. Send Feature Vector to Model for Prediction
~5-50ms
Code and
configuration
Data Lake,
Warehouse,
Kafka
Model
Registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Model
Monitoring
Model
Development
Features
Retrieve Features
Log Predictions Training Data Statistics
Sync
HopsFS
Scaleout
Metadata
Experiment
Tracking
Programs
Feature Statistics
A/B Test
Model
Statistics
Serving
Statistics
Search (Artifacts,
Provenance and
Metadata)
Feature
Store
Elasticsearch
Experiments
Hopsworks End-to-End Machine Learning (ML) Pipelines
Hopsworks - Develop and Operate ML Applications at Scale
APPLICATIONS
API
DASHBOARDS
HOPSWORKS
DATASOURCE
ORCHESTRATION
Airflow
BATCH
Apache Spark
STREAMING
Apache Spark
Apache Flink
HOPSWORKS
FEATURE
STORE
ML DEVELOP
AND TRAIN
Notebooks as Jobs
Tensorflow
Scikit-Learn
PyTorch
Tensorboard
FILESYSTEM & METASTORE
HopsFS
MODEL
SERVING AND
MONITORING
KFServing
TF-Serving
Flask
Data Preparation
& Ingestion
Experimentation
& Model Training
Deploy
& Productionalize
Apache
Kafka
Transitioning Security to
the Cloud….
15
Project-Based Multi-Tenant Security Model
16
Project-Based Multi-Tenant Security Model
17
Project-Based Multi-Tenant Security Model
Moving to the Cloud - Connectors and Integrations
Hopsworks
Project-Based Multi-Tenant Security
API
KEY
IAM Profile or Federated IAM Role
Users
Jobs
Dev Feature Store
Staging Feature Store
Prod Feature Store
User
Login
(LDAP, AD,
OAuth2, 2FA)
databricks
SageMaker
Kubeflow
Amazon EMR
Delta Lake
Snowflake
Amazon S3
Amazon
Redshift
19
Making Hopsworks Cloud-Native
Hopsworks Open Source Cloud Native Service
Open-Source Docker Repository ECR / ACR
Kubernetes EKS / AKS
Hopsworks Services Rejected Cloud Native Versions
Spark-on-YARN Databricks / EMR
HopsFS S3
RonDB DynamoDB/Elasticache
Kafka Managed Kafka
Elastic Open Distro AWS Elastic
Developing
Hopsworks.ai
The first European Company to provide a managed
scale-out data and AI platform in the cloud
21
Hopsworks.ai
Early 2020
Nov 2020 (GA)
22
Serverless Platform on AWS - Amplify, Cognito, CloudFront, Lambdas, Route 53, DynamoDB
23
Lambdas
Integration with other Platforms - Databricks
25
Cloud-Native Kubernetes Integration
https://www.logicalclocks.com/blog/how-we-secure-your-data-with-hopsworks
HopsFS Hive
Elastic
Kafka
Hopsworks
Pod
User
Project Creation
Kubernetes (EKS, AKS)
Access
using
X.509 /JWT
v
Secrets
Project_User
X.509
JWT
Project_User
X.509
JWT
Project_
User
X.509
JWT
Jobs UI
Project-User
Docker Container
Project
Jobs
2
1
1
API
server
Scheduler
2
DynamoDB
Expensive,
High Latency (~10ms lookup),
Limited Query Support - Reporting a Problem,
Quotas, Hotspots
RonDB - a new open-source cloud-native distribution of NDB (MySQL Cluster)
Inventor of NDB
(MySQL Cluster)
www.rondb.com
RonDB vs Redis - RonDB outperforms on 1 CPU Core and Keeps on Scaling
MySQL Cluster (NDB) - the world’s highest throughput transactional datastore
200m ops/second with NDB - world’s fastest key-value store
28
RonDB - the first LATS Database in the Cloud. Launched in private beta Feb 2021.
RonDB is a LATS Database
low Latency, high Availability, high Throughput, scalable Storage
< 1ms KV lookup
>10M KV Lookups/sec
>99.999% availability
Lessons Learnt
(so far)
30
Lessons Learnt (so far) in building a Cloud Native Managed Data/AI Platform
Shiny new Toys not always the best
● Lambda functions poor for synchronous events (e.g. request reply)
due to the slow response times
○ Unsuitable for "web" endpoints - 500-2000 ms response time
○ Cold lambdas, but also JS JIT.
○ Parallel operations difficult due to lack of support in lambda
● “Amplifeck’d” is a common word on our Slack
● SQL > Key Value APIs
31
RonDB Competitors
Latency Throughput
Availability
Scalability
RonDB
DynamoDB,
Cassandra,
BigTable
Redis
Online Feature
Stores
Demo Time.
github.com/logicalclocks/hopsworks
-
@logicalclocks
-
www.logicalclocks.com
Feature Engineering and Model Training Pipeline - With a Feature Store
KAFKA Train/Test Data
(S3, HDFS, etc)
Online
Application
Data Warehouse
Data Lake
Feature
Engineering
Offline
Feature Store
Model
Training
Model
Serving
Online
Feature Store
Model
Repository
Monitor
Deploy
Feature Vectors
Result Sink (DB)
Batch
Scoring
Batch Access
Deploy
Feature Store

More Related Content

Similar to Building Hopsworks, a cloud-native managed feature store for machine learning

Vert.x devoxx london 2013
Vert.x devoxx london 2013Vert.x devoxx london 2013
Vert.x devoxx london 2013
Stuart (Pid) Williams
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
Robert Lemke
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
DataWorks Summit
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
Cisco DevNet
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
Paul Czarkowski
 
POCO C++ Libraries Intro and Overview
POCO C++ Libraries Intro and OverviewPOCO C++ Libraries Intro and Overview
POCO C++ Libraries Intro and OverviewGünter Obiltschnig
 
Docker intro
Docker introDocker intro
Docker introspiddy
 
IBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptx
IBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptxIBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptx
IBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptx
Georg Ember
 
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless  - Serverless Summit 2017 - Krishna KumarKubernetes for Serverless  - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
CodeOps Technologies LLP
 
Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)
Krishna-Kumar
 
CoreOS and cloud provider integration: simple cloud-init example at Exoscale
CoreOS and cloud provider integration: simple cloud-init example at ExoscaleCoreOS and cloud provider integration: simple cloud-init example at Exoscale
CoreOS and cloud provider integration: simple cloud-init example at Exoscale
Antoine COETSIER
 
Rajnish_tyagi Profile
Rajnish_tyagi ProfileRajnish_tyagi Profile
Rajnish_tyagi ProfileRajnish Tyagi
 
How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)
Kaliop-slide
 
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A ServiceScala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Romeo Kienzler
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 
Inside Triton, July 2015
Inside Triton, July 2015Inside Triton, July 2015
Inside Triton, July 2015
Casey Bisson
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
NETWAYS
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
Shubhra Kar
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
Nicola Ferraro
 

Similar to Building Hopsworks, a cloud-native managed feature store for machine learning (20)

Vert.x devoxx london 2013
Vert.x devoxx london 2013Vert.x devoxx london 2013
Vert.x devoxx london 2013
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
 
Techmeeting-17feb2016
Techmeeting-17feb2016Techmeeting-17feb2016
Techmeeting-17feb2016
 
POCO C++ Libraries Intro and Overview
POCO C++ Libraries Intro and OverviewPOCO C++ Libraries Intro and Overview
POCO C++ Libraries Intro and Overview
 
Docker intro
Docker introDocker intro
Docker intro
 
IBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptx
IBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptxIBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptx
IBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptx
 
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless  - Serverless Summit 2017 - Krishna KumarKubernetes for Serverless  - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
 
Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)
 
CoreOS and cloud provider integration: simple cloud-init example at Exoscale
CoreOS and cloud provider integration: simple cloud-init example at ExoscaleCoreOS and cloud provider integration: simple cloud-init example at Exoscale
CoreOS and cloud provider integration: simple cloud-init example at Exoscale
 
Rajnish_tyagi Profile
Rajnish_tyagi ProfileRajnish_tyagi Profile
Rajnish_tyagi Profile
 
How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)
 
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A ServiceScala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Inside Triton, July 2015
Inside Triton, July 2015Inside Triton, July 2015
Inside Triton, July 2015
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 

More from Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
Jim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
Jim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
Jim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
Jim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Jim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
Jim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
Jim Dowling
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
Jim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
Jim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Jim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
Jim Dowling
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
Jim Dowling
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Jim Dowling
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
Jim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
Jim Dowling
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
Jim Dowling
 

More from Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

Building Hopsworks, a cloud-native managed feature store for machine learning

  • 1. Jim Dowling CEO, Logical Clocks Cloud Native London Meetup, March 3 2021 Building Hopsworks, a cloud-native managed feature store for machine learning
  • 2. Can we make a Monolith fly in the clouds?
  • 3. The Hopsworks Feature Store - Available on all Platforms as Managed, Enterprise, and Community hopsworks.ai (managed platform) Enterprise Hopsworks (self-hosted platform) Community Hopsworks** (self-hosted platform) Runs on any Platform* (On-premise, Cloud, VMs, etc) Runs on any Platform* (On-premise, Cloud, VMs, etc) *Supported operating systems: RHEL/Centos 7.x and Ubuntu 18.04. Minimum Requirements: 32GB RAM, 100GB disk, 8 CPUs. Runs in air-gapped environments. **Community Hopsworks does not include (1) Feature Store Connectors to Third-Party Platforms and (2) SSO with Active Directory/OAuth-2/Azure-AD/AWS. 2016 2018 2020 Only Managed Feature Available today on both AWS and Azure
  • 4. When do I need a Feature Store for Machine Learning and what it is anyway?
  • 5. Business Problem: Use Machine Learning to Predict Money Laundering Reference: Whitepaper, Webinar
  • 6. 6 What data can I use to solve my Anti-Money Laundering Problem with? Know Your Customer Data Historical Financial Transactions Recent Financial Transactions Data Warehouse Data Lake Message Bus TRAIN SERVE
  • 7. 7 It is not always easy to get access to Enterprise data for training and serving. Know Your Customer Data Historical Financial Transactions Recent Financial Transactions Data Warehouse Data Lake Message Bus TRAIN SERVE
  • 8. 8 What data can I use to make predictions with? Know Your Customer Data Historical Financial Transactions Recent Financial Transactions Data Warehouse Data Lake Message Bus TRAIN SERVE Feature Store
  • 9. Where does the Feature Store fit into the ML Pipeline? FEATURE STORE TRAIN / SERVE FEATURIZE
  • 10. Offline Feature Store - Create Training Data and Batch Predictions df = kycFG.select_all().join(rftFG.select_all()).join(hftFG.select_all()) td = fs.create_training_dataset("precipitation_training_dataset", version=1, data_format="tfrecord", description="Precipitation Training dataset", splits={'train': 0.7, 'test': 0.2, 'validate': 0.1}) td.save(df) FG=Feature Group https://docs.hopsworks.ai/ Feature Store kycFG rftFG hftFG Training Data (.tfrecord) Model train
  • 11. Online Feature Store - the Data Layer for Operational (Online) Models US-West-1c US-West-la US-West-1b RonDB2 Model RonDB1 Model RonDB3 Model Online Application 1.JDBC 2.Predict 2-20ms 1. Build Feature Vector Using Online Feature Store 2. Send Feature Vector to Model for Prediction ~5-50ms
  • 12. Code and configuration Data Lake, Warehouse, Kafka Model Registry Feature Engineering Model Serving Model Training Model Deploy Model Monitoring Model Development Features Retrieve Features Log Predictions Training Data Statistics Sync HopsFS Scaleout Metadata Experiment Tracking Programs Feature Statistics A/B Test Model Statistics Serving Statistics Search (Artifacts, Provenance and Metadata) Feature Store Elasticsearch Experiments Hopsworks End-to-End Machine Learning (ML) Pipelines
  • 13. Hopsworks - Develop and Operate ML Applications at Scale APPLICATIONS API DASHBOARDS HOPSWORKS DATASOURCE ORCHESTRATION Airflow BATCH Apache Spark STREAMING Apache Spark Apache Flink HOPSWORKS FEATURE STORE ML DEVELOP AND TRAIN Notebooks as Jobs Tensorflow Scikit-Learn PyTorch Tensorboard FILESYSTEM & METASTORE HopsFS MODEL SERVING AND MONITORING KFServing TF-Serving Flask Data Preparation & Ingestion Experimentation & Model Training Deploy & Productionalize Apache Kafka
  • 18. Moving to the Cloud - Connectors and Integrations Hopsworks Project-Based Multi-Tenant Security API KEY IAM Profile or Federated IAM Role Users Jobs Dev Feature Store Staging Feature Store Prod Feature Store User Login (LDAP, AD, OAuth2, 2FA) databricks SageMaker Kubeflow Amazon EMR Delta Lake Snowflake Amazon S3 Amazon Redshift
  • 19. 19 Making Hopsworks Cloud-Native Hopsworks Open Source Cloud Native Service Open-Source Docker Repository ECR / ACR Kubernetes EKS / AKS Hopsworks Services Rejected Cloud Native Versions Spark-on-YARN Databricks / EMR HopsFS S3 RonDB DynamoDB/Elasticache Kafka Managed Kafka Elastic Open Distro AWS Elastic
  • 20. Developing Hopsworks.ai The first European Company to provide a managed scale-out data and AI platform in the cloud
  • 22. 22 Serverless Platform on AWS - Amplify, Cognito, CloudFront, Lambdas, Route 53, DynamoDB
  • 24. Integration with other Platforms - Databricks
  • 25. 25 Cloud-Native Kubernetes Integration https://www.logicalclocks.com/blog/how-we-secure-your-data-with-hopsworks HopsFS Hive Elastic Kafka Hopsworks Pod User Project Creation Kubernetes (EKS, AKS) Access using X.509 /JWT v Secrets Project_User X.509 JWT Project_User X.509 JWT Project_ User X.509 JWT Jobs UI Project-User Docker Container Project Jobs 2 1 1 API server Scheduler 2
  • 26. DynamoDB Expensive, High Latency (~10ms lookup), Limited Query Support - Reporting a Problem, Quotas, Hotspots
  • 27. RonDB - a new open-source cloud-native distribution of NDB (MySQL Cluster) Inventor of NDB (MySQL Cluster) www.rondb.com RonDB vs Redis - RonDB outperforms on 1 CPU Core and Keeps on Scaling MySQL Cluster (NDB) - the world’s highest throughput transactional datastore 200m ops/second with NDB - world’s fastest key-value store
  • 28. 28 RonDB - the first LATS Database in the Cloud. Launched in private beta Feb 2021. RonDB is a LATS Database low Latency, high Availability, high Throughput, scalable Storage < 1ms KV lookup >10M KV Lookups/sec >99.999% availability
  • 30. 30 Lessons Learnt (so far) in building a Cloud Native Managed Data/AI Platform Shiny new Toys not always the best ● Lambda functions poor for synchronous events (e.g. request reply) due to the slow response times ○ Unsuitable for "web" endpoints - 500-2000 ms response time ○ Cold lambdas, but also JS JIT. ○ Parallel operations difficult due to lack of support in lambda ● “Amplifeck’d” is a common word on our Slack ● SQL > Key Value APIs
  • 33. Feature Engineering and Model Training Pipeline - With a Feature Store KAFKA Train/Test Data (S3, HDFS, etc) Online Application Data Warehouse Data Lake Feature Engineering Offline Feature Store Model Training Model Serving Online Feature Store Model Repository Monitor Deploy Feature Vectors Result Sink (DB) Batch Scoring Batch Access Deploy Feature Store