Hopsworks
The Platform for Data-Intensive AI
Steffen Grohsschmiedt
Head of Cloud
steffen@logicalclocks.com
@grohsschmiedt
Hopsworks Timeline
“If you’re working with big data and Hadoop, this one paper could repay your
investment in the Morning Paper many times over.... HopsFS is a huge win.”
- Adrian Colyer, The Morning Paper
World’s fastest Hadoop
Published at USENIX FAST
with Oracle and Spotify
World’s First #1
GPUs-as-a-Resource support
in the Hopsworks platform
World’s First #3
Open Source Feature Store for
Machine Learning
World’s First #2
Distributed File System to
store small files in metadata
on NVMe disks
Winner of IEEE..
.. Scale Challenge 2017
with HopsFS - 1.2m ops/sec
2017 2018 2019
World’s most scalable
Filesystem with
Multi Data Center Availability
Example workflow in Hopsworks at Scale
1. Insert 1m images (<100kb) in seconds
2. Train a DNN classifier using 100s of GPUs
3. Run a Spark job to identify all objects in the 1m images and add the image
annotations (JSON) as extended metadata to HopsFS
4. “show me the images with >3 bicycles” and get a sub-second response.
Data scientists: Do it all in Jupyter notebooks and Python (if you want)!
Ops folks: Remove the image directory, and elasticsearch is auto-cleaned up!
Images
Train DNN
HopsFS
Image
Search
App
1. 2. 3. 4.
Elastic
Data validation
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
Data Model Prediction
φ(x)
Hopsworks hides the Complexity of Deep Learning
*Figure from “Technical Debt in Machine Learning Systems”, Google research paper
Data validation
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
Data Model Prediction
φ(x)
Hopsworks hides the Complexity of Deep Learning
Hopsworks
Feature Store
Data validation
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
Data Model Prediction
φ(x)
Hopsworks hides the Complexity of Deep Learning
Hopsworks
Feature Store
Hopsworks
REST API
What is Hopsworks?
Efficiency & Performance Security & GovernanceUsability & Process
Secure Multi-Tenancy
Project-based restricted access
Encryption At-Rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, experiments, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies
Jupyter/Python Development
Notebooks in pipelines
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, MLeap, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe speed with Big Data
Horizontally Scalable
Ingestion, DataPrep,
Training, Serving
FS
Which services require Distributed Metadata (HopsFS)?
Efficiency & Performance Security & GovernanceUsability & Process
Secure Multi-Tenancy
Project-based restricted access
Encryption At-Rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, experiments, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies
Jupyter/Python Development
Notebooks in pipelines
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, MLeap, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe speed with Big Data
Horizontally Scalable
Ingestion, DataPrep,
Training, Serving
FS
End-to-End ML Pipelines in Hopsworks
End-to-End Pipelines can be factored into stages
Typical Feature Store Pipelines
Hopsworks’ Feature Store
Dev View: Pipelines of Jupyter Notebooks in Airflow
Hopsworks development environment
First Class Python: Conda in the Cluster
Conda
Repo
Hopsworks Cluster
No need to write
Dockerfiles
Demo
How to get started with Hopsworks?
@hopsworks
Register for a free account at: www.hops.site
Images available for AWS, GCE, Virtualbox.
https://www.logicalclocks.com/
https://github.com/logicalclocks/hopsworks
Reach us

Hopsworks - The Platform for Data-Intensive AI

  • 1.
    Hopsworks The Platform forData-Intensive AI Steffen Grohsschmiedt Head of Cloud steffen@logicalclocks.com @grohsschmiedt
  • 2.
    Hopsworks Timeline “If you’reworking with big data and Hadoop, this one paper could repay your investment in the Morning Paper many times over.... HopsFS is a huge win.” - Adrian Colyer, The Morning Paper World’s fastest Hadoop Published at USENIX FAST with Oracle and Spotify World’s First #1 GPUs-as-a-Resource support in the Hopsworks platform World’s First #3 Open Source Feature Store for Machine Learning World’s First #2 Distributed File System to store small files in metadata on NVMe disks Winner of IEEE.. .. Scale Challenge 2017 with HopsFS - 1.2m ops/sec 2017 2018 2019 World’s most scalable Filesystem with Multi Data Center Availability
  • 3.
    Example workflow inHopsworks at Scale 1. Insert 1m images (<100kb) in seconds 2. Train a DNN classifier using 100s of GPUs 3. Run a Spark job to identify all objects in the 1m images and add the image annotations (JSON) as extended metadata to HopsFS 4. “show me the images with >3 bicycles” and get a sub-second response. Data scientists: Do it all in Jupyter notebooks and Python (if you want)! Ops folks: Remove the image directory, and elasticsearch is auto-cleaned up! Images Train DNN HopsFS Image Search App 1. 2. 3. 4. Elastic
  • 4.
    Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning FeatureEngineering Data Collection Hardware Management Data Model Prediction φ(x) Hopsworks hides the Complexity of Deep Learning *Figure from “Technical Debt in Machine Learning Systems”, Google research paper
  • 5.
    Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning FeatureEngineering Data Collection Hardware Management Data Model Prediction φ(x) Hopsworks hides the Complexity of Deep Learning Hopsworks Feature Store
  • 6.
    Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning FeatureEngineering Data Collection Hardware Management Data Model Prediction φ(x) Hopsworks hides the Complexity of Deep Learning Hopsworks Feature Store Hopsworks REST API
  • 11.
    What is Hopsworks? Efficiency& Performance Security & GovernanceUsability & Process Secure Multi-Tenancy Project-based restricted access Encryption At-Rest, In-Motion TLS/SSL everywhere AI-Asset Governance Models, experiments, data, GPUs Data/Model/Feature Lineage Discover/track dependencies Jupyter/Python Development Notebooks in pipelines Version Everything Code, Infrastructure, Data Model Serving on Kubernetes TF Serving, MLeap, SkLearn End-to-End ML Pipelines Orchestrated by Airflow Feature Store Data warehouse for ML Distributed Deep Learning Faster with more GPUs HopsFS NVMe speed with Big Data Horizontally Scalable Ingestion, DataPrep, Training, Serving FS
  • 12.
    Which services requireDistributed Metadata (HopsFS)? Efficiency & Performance Security & GovernanceUsability & Process Secure Multi-Tenancy Project-based restricted access Encryption At-Rest, In-Motion TLS/SSL everywhere AI-Asset Governance Models, experiments, data, GPUs Data/Model/Feature Lineage Discover/track dependencies Jupyter/Python Development Notebooks in pipelines Version Everything Code, Infrastructure, Data Model Serving on Kubernetes TF Serving, MLeap, SkLearn End-to-End ML Pipelines Orchestrated by Airflow Feature Store Data warehouse for ML Distributed Deep Learning Faster with more GPUs HopsFS NVMe speed with Big Data Horizontally Scalable Ingestion, DataPrep, Training, Serving FS
  • 13.
  • 14.
    End-to-End Pipelines canbe factored into stages
  • 15.
  • 16.
  • 17.
    Dev View: Pipelinesof Jupyter Notebooks in Airflow
  • 18.
  • 19.
    First Class Python:Conda in the Cluster Conda Repo Hopsworks Cluster No need to write Dockerfiles
  • 20.
  • 21.
    How to getstarted with Hopsworks? @hopsworks Register for a free account at: www.hops.site Images available for AWS, GCE, Virtualbox. https://www.logicalclocks.com/ https://github.com/logicalclocks/hopsworks Reach us