SlideShare a Scribd company logo
AI-generated NFTs on FVM
with Bacalhau Stable Diffusion
Bacalhau
Ally Haire
Developer Relations
Engineer
DeveloperAlly
Building a Text to Image
Model (Stable Diffusion - GPU)
Bacalhau
Ally Haire
Developer Relations
Engineer
DeveloperAlly
Agenda aka the timestamps…
● Let’s see what we’re building in
action!
● Bacal… what? The why’s and how’s
of Bacalhau
● Brief intro to Machine Learning and
Stable Diffusion (text->image model)
● Show me the code! Coding up a text-
to-image script
● Running on Bacalhau
Stable Diffusion on GPU
with Bacalhau
Making Open Source Dall-E!
The example…
https://docs.bacalhau.org/examples/model-inference/stable-diffusion-gpu/
The example in action:
Running our Open Source Dall-E…
All anyone needs to do to run this example at any time is install
Bacalhau (one line of code) and run this docker image!!!
The Docker Image saved on Bacalhau Registry
The python script
on Docker to run
Output folder
to save to
The text input flag in python script
Bac.
CLI Docker command
So many cool cod (pun intended) …
Why Bacalhau?
A Primer on Building Machine Learning Models
How do we make this?
That one time… when I
asked ChatGPT what I
needed to get started
with ML models….
FYI: ChatGPT is a large
language model trained
with reinforcement
learning
Ahh the paradox of
asking ML for ML help ;)
Can I do this locally or on cloud though?
You could run this example locally - in fact I did manage to get it running on my
Mac M1 with a few code changes, however, computing the image from text took
a good 20 minutes or more
You could try and run this in the cloud with more computing power - though I
wasn’t able to find anywhere that provided a free-tier GPU model to use for it
(and I wasn’t willing to pay to try something!).
Bacalhau Architecture
The decentralised computation network
Bacal.. - what ??
Bacalhau is a network of open
compute resources available to
serve any data processing
workload
- It’s simple to use (you don’t
need an AI degree!)
- Requires minimal operational
overhead or setup
- It’s decentralised-first (or
edge-first) principled
- Aims to provide efficient
distributed computation with
batched tasks
Learn more about Bacalhau!
@BacalhauProject
https://youtu.be/RZopDyTJ1pk
Bacalhau & FVM?
FVM - programmable data on small
amounts of state
Bacalhau - Computation over this
or any data including big data and
support for GPUs
Future: Bacalhau + FVM - calling
bacalhau in your smart contracts!
Bacalhau Platform Architecture
Bacalhau provides a platform for
public, transparent, and optionally
verifiable computation.
It enables users to run arbitrary
Docker containers and WebAssembly
(wasm) images as tasks against data
stored in the InterPlanetary File
System (IPFS)
It operates as a peer-to-peer network
of nodes where each node has both a
requestor and compute component
Bacalhau System Components
● Requester node
(component)
● Compute node
(component)
● Transport layer
(interface)
● Executer (interface)
● Storage Provider
(interface)
● Verifier (interface)
● Publisher (interface)
Bacalhau Job Lifecycle
Job Submission Job Acceptance Job Execution
Job Verification Job Publishing
Job Submission Job Acceptance Job Execution
Job Verification Job Publishing
Err… Stable Diffusion?
No, you don’t need to be a data scientist!
AI and Machine Learning Quick Intro
Artificial Intelligence is an umbrella term for a few different concepts.
AI is any technique that allows computers to bring meaning to data in
similar ways to a human.
Machine learning is a subset of AI application that learns by itself and
has 3 main types:
• Supervised learning
• Regression & Classification Algorithms
• Unsupervised learning
• Clustering & Association Algorithms
• Reinforcement learning
• Value, Policy or Model Based reinforcement methods
Deep learning is a subset of machine learning application that teaches
itself to perform a specific task.
Stable what now…?
Generically, stable diffusion is what happens when you put a couple of drops of
dye into a bucket of water. Given time, the dye randomly disperses and eventually
settles into a uniform distribution which colours all the water evenly
In computer science, you define rules for your (dye) particles to follow and the
medium this takes place in.
In the example we’re doing, Stable Diffusion is a machine learning model used for
text-to-image processing (like Dall-E) and based on a diffusion probabilistic model
that uses a transformer
to generate images from
text.
Building the Scripts
Stable Diffusion on GPU with Google Colab
Tools & Environment
For this example we’ll need
- Google Colab (for testing our scripts)
- https://colab.research.google.com/
- Optional: Docker (if you want to deploy your own docker image)
Run any of our docs
examples in Google
Colab!
Get & Start
a Colab
You’ll need to add
Colab from the
Google Marketplace if
you want to create
your own notebooks
(you can run our
docs examples
without this though!)
Create a new notebook
Go to
https://colab.research.
google.com/
Google Colab setup
We’re
running on a
GPU for this
example, so
we want to
set our (VM)
runtime
environment
to GPU
Show me the code
Install some of our python dependencies
Fork of a keras/tensorflow implementation of Stable Diffusion:
The text-to-image library
Drivers for NVIDIA GPUs
Lib for progress bars
Tensorflow library & add-ons, unicode fixer
text2image.py
This is the basic text to
image script. It uses a
keras/tensorflow
implementation fork and
then generates the images
from a given text string
and finally displays the
image generated.
The ML weights are pre-
calculated in the library
We can do better…
stable-diffusion.py
This script adds input
parameters to our
text2image script and
saves the output images
to a file
Docker build
Dockerfile
Build & Push Docker Image
Build with Bacalhau
Stable Diffusion on GPU
Run on Bacalhau
Pizza for everyone!
Join the discussion:
- Twitter @BacalhauProject
- YouTube @bacalhauproject
- Slack #bacalhau @filecoinproject
- Github @bacalhau.org
- Forum github.com/filecoin-project
/bacalhau/discussions
See more examples:
- docs.bacalhau.org
Get Involved in the future of data!
Alan Kay - Computer Scientist
“The best way to predict
the future is to create it”
Computable
The Future of
Filecoin is
You.
The Future of
Filecoin is

More Related Content

Similar to Bacalhau: Stable Diffusion on a GPU

Lunch and learn as3_frameworks
Lunch and learn as3_frameworksLunch and learn as3_frameworks
Lunch and learn as3_frameworks
Yuri Visser
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
Nicola Ferraro
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Jason Dai
 
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allEclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
Marc Dutoo
 
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
Andrés Leonardo Martinez Ortiz
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R
Kai Lichtenberg
 
Google Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsGoogle Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teams
Barton Rhodes
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
Ian Pointer
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing
Ricardo Amaro
 
Common primitives in Docker environments
Common primitives in Docker environmentsCommon primitives in Docker environments
Common primitives in Docker environments
alexandru giurgiu
 
Docker 101
Docker 101 Docker 101
Docker 101
Kevin Nord
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Indrajit Poddar
 
Why scala for data science
Why scala for data scienceWhy scala for data science
Why scala for data science
Guglielmo Iozzia
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
GDSCNiT
 
Strata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu MukerjiStrata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu Mukerji
Manu Mukerji
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notes
Perrin Harkins
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
WLOG Solutions
 
Dictionary Within the Cloud
Dictionary Within the CloudDictionary Within the Cloud
Dictionary Within the Cloud
gueste4978b94
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to Kubernetes
Paul Czarkowski
 

Similar to Bacalhau: Stable Diffusion on a GPU (20)

Lunch and learn as3_frameworks
Lunch and learn as3_frameworksLunch and learn as3_frameworks
Lunch and learn as3_frameworks
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allEclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
 
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R
 
Google Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsGoogle Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teams
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing
 
Common primitives in Docker environments
Common primitives in Docker environmentsCommon primitives in Docker environments
Common primitives in Docker environments
 
Docker 101
Docker 101 Docker 101
Docker 101
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
 
Why scala for data science
Why scala for data scienceWhy scala for data science
Why scala for data science
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
Strata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu MukerjiStrata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu Mukerji
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notes
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
Dictionary Within the Cloud
Dictionary Within the CloudDictionary Within the Cloud
Dictionary Within the Cloud
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to Kubernetes
 

Recently uploaded

Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 

Recently uploaded (20)

Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 

Bacalhau: Stable Diffusion on a GPU

  • 1. AI-generated NFTs on FVM with Bacalhau Stable Diffusion Bacalhau Ally Haire Developer Relations Engineer DeveloperAlly
  • 2. Building a Text to Image Model (Stable Diffusion - GPU) Bacalhau Ally Haire Developer Relations Engineer DeveloperAlly
  • 3. Agenda aka the timestamps… ● Let’s see what we’re building in action! ● Bacal… what? The why’s and how’s of Bacalhau ● Brief intro to Machine Learning and Stable Diffusion (text->image model) ● Show me the code! Coding up a text- to-image script ● Running on Bacalhau
  • 4. Stable Diffusion on GPU with Bacalhau Making Open Source Dall-E!
  • 6. The example in action: Running our Open Source Dall-E… All anyone needs to do to run this example at any time is install Bacalhau (one line of code) and run this docker image!!! The Docker Image saved on Bacalhau Registry The python script on Docker to run Output folder to save to The text input flag in python script Bac. CLI Docker command
  • 7. So many cool cod (pun intended) …
  • 8. Why Bacalhau? A Primer on Building Machine Learning Models
  • 9. How do we make this? That one time… when I asked ChatGPT what I needed to get started with ML models…. FYI: ChatGPT is a large language model trained with reinforcement learning Ahh the paradox of asking ML for ML help ;)
  • 10. Can I do this locally or on cloud though? You could run this example locally - in fact I did manage to get it running on my Mac M1 with a few code changes, however, computing the image from text took a good 20 minutes or more You could try and run this in the cloud with more computing power - though I wasn’t able to find anywhere that provided a free-tier GPU model to use for it (and I wasn’t willing to pay to try something!).
  • 12. Bacal.. - what ?? Bacalhau is a network of open compute resources available to serve any data processing workload - It’s simple to use (you don’t need an AI degree!) - Requires minimal operational overhead or setup - It’s decentralised-first (or edge-first) principled - Aims to provide efficient distributed computation with batched tasks Learn more about Bacalhau! @BacalhauProject https://youtu.be/RZopDyTJ1pk
  • 13. Bacalhau & FVM? FVM - programmable data on small amounts of state Bacalhau - Computation over this or any data including big data and support for GPUs Future: Bacalhau + FVM - calling bacalhau in your smart contracts!
  • 14. Bacalhau Platform Architecture Bacalhau provides a platform for public, transparent, and optionally verifiable computation. It enables users to run arbitrary Docker containers and WebAssembly (wasm) images as tasks against data stored in the InterPlanetary File System (IPFS) It operates as a peer-to-peer network of nodes where each node has both a requestor and compute component
  • 15. Bacalhau System Components ● Requester node (component) ● Compute node (component) ● Transport layer (interface) ● Executer (interface) ● Storage Provider (interface) ● Verifier (interface) ● Publisher (interface)
  • 16. Bacalhau Job Lifecycle Job Submission Job Acceptance Job Execution Job Verification Job Publishing Job Submission Job Acceptance Job Execution Job Verification Job Publishing
  • 17. Err… Stable Diffusion? No, you don’t need to be a data scientist!
  • 18. AI and Machine Learning Quick Intro Artificial Intelligence is an umbrella term for a few different concepts. AI is any technique that allows computers to bring meaning to data in similar ways to a human. Machine learning is a subset of AI application that learns by itself and has 3 main types: • Supervised learning • Regression & Classification Algorithms • Unsupervised learning • Clustering & Association Algorithms • Reinforcement learning • Value, Policy or Model Based reinforcement methods Deep learning is a subset of machine learning application that teaches itself to perform a specific task.
  • 19. Stable what now…? Generically, stable diffusion is what happens when you put a couple of drops of dye into a bucket of water. Given time, the dye randomly disperses and eventually settles into a uniform distribution which colours all the water evenly In computer science, you define rules for your (dye) particles to follow and the medium this takes place in. In the example we’re doing, Stable Diffusion is a machine learning model used for text-to-image processing (like Dall-E) and based on a diffusion probabilistic model that uses a transformer to generate images from text.
  • 20. Building the Scripts Stable Diffusion on GPU with Google Colab
  • 21. Tools & Environment For this example we’ll need - Google Colab (for testing our scripts) - https://colab.research.google.com/ - Optional: Docker (if you want to deploy your own docker image) Run any of our docs examples in Google Colab!
  • 22. Get & Start a Colab You’ll need to add Colab from the Google Marketplace if you want to create your own notebooks (you can run our docs examples without this though!)
  • 23. Create a new notebook Go to https://colab.research. google.com/
  • 24. Google Colab setup We’re running on a GPU for this example, so we want to set our (VM) runtime environment to GPU
  • 25. Show me the code Install some of our python dependencies Fork of a keras/tensorflow implementation of Stable Diffusion: The text-to-image library Drivers for NVIDIA GPUs Lib for progress bars Tensorflow library & add-ons, unicode fixer
  • 26. text2image.py This is the basic text to image script. It uses a keras/tensorflow implementation fork and then generates the images from a given text string and finally displays the image generated. The ML weights are pre- calculated in the library
  • 27. We can do better… stable-diffusion.py This script adds input parameters to our text2image script and saves the output images to a file
  • 28. Docker build Dockerfile Build & Push Docker Image
  • 29. Build with Bacalhau Stable Diffusion on GPU
  • 32. Join the discussion: - Twitter @BacalhauProject - YouTube @bacalhauproject - Slack #bacalhau @filecoinproject - Github @bacalhau.org - Forum github.com/filecoin-project /bacalhau/discussions See more examples: - docs.bacalhau.org Get Involved in the future of data!
  • 33. Alan Kay - Computer Scientist “The best way to predict the future is to create it”

Editor's Notes

  1. G’day Dev’s and Fil-ders!! I’m Ally and I’m a Developer Relations Engineer working with the FIlecoin Foundation and Protocol Labs. And today I wanted to introduce you to a project we are very excited about and that will hopefully help democratise the future of data processing - Bacalhau. I’m going to be showing you a really cool example of how to build your own text to image code and then run it on Bacalhau, which for those that haven’t heard of it, is not just a portuguese fish, but a peer-to-peer open computation network!
  2. G’day Dev’s and Fil-ders!! I’m Ally and I’m a Developer Relations Engineer working with the FIlecoin Foundation and Protocol Labs. And today I wanted to introduce you to a project we are very excited about and that will hopefully help democratise the future of data processing - Bacalhau. I’m going to be showing you a really cool example of how to build your own text to image code and then run it on Bacalhau, which for those that haven’t heard of it, is not just a portuguese fish, but a peer-to-peer open computation network!
  3. Warning from https://docs.oakhost.net/tutorials/tensorflow-apple-silicon/ Caveat on running the first example - this may not work for your machine - which is one of the exact reasons we have bacalhau for large data!! Options are… (Collab notebook, paid cloud environment for testing)
  4. So here’s the timestamps for those that want to go directly to what they’re interested in also…. First we’ll see this fully built model in action on bacalhau, then I’ll chat a little bit about what Bacalhau is, how it works and what advantages it can offer you. I’ll then give a brief breakdown on what a Stable Diffusion model is and how it fits into the Machine Learning world And then I’ll move on to walking through how you can create this example end-to-end and run it on the Bacalhau network.
  5. You can find the example I’m going through today in the bacalahau docs - along with a host of other awesome examples you can try out for yourself. In this video we’re going to be building, testing and running machine learning code with a machine learning model called stable diffusion (more on that later), which will take any text you provide it and transform that text into a funky and original image. Pretty cool right! I was excited to do this video and see how it works.
  6. So, before we get started, let’s take a sneak peak of what our final example looks like for all the visual learner’s out there :) And to clarify before we start - you don’t need any prior knowledge of Bacalhau or data science or any special developer environment or hardware to join in with me here either! (install bacalhau then run the docker image on bacalhau in Google Colab - explain it later though) We will be using python, though i’ll walk you through everything the code does, so if you have any sort of coding background, you’ll be more than capable of doing this yourself! The main point here is “If you can write Python, Go, Javascript, R in any language and want to use ANY type of data, then Bacalhau is for you.” And…. even if you don’t…. If you can open the terminal on your computer and copy paste 2 lines of code into it - one to install Bacalhau and the other to run this example that’s already been ‘uploaded’ to the network…. (image should be loaded by now)... well then you can go ahead and use this on your machine not even worry about the rest of this tutorial! Though I hope you stick around to learn how to make more fun images like this one and perhaps gets some inspiration for how you’d go about building your own data projects on Bacalhau (and show off to us!!)
  7. Building and testing machine learning models can be a tricky business and this is mostly because of the compute power you need to train and run them. Like most development, you need a few things to get started … If you’ve never built one before - this is the section for you!
  8. - like knowing a programming language for writing and running your machine learning code and setting up a good developer environment for the task you’re looking to do. In fact when I asked fellow ML model ChatGPT what I needed to get started with machine learning .. it told me i needed A programming language for writing and running your machine learning code (Ok I know some python - check!) A machine learning framework or library that provides pre-built algorithms, tools, and other resources for building and training machine learning models. (Technically it’s not ESSENTIAL - you could build your own model implementation, but hey that’s hard and would be like building your own sort function, and luckily that’s not necessary - there are several open source libraries out there like Tensorflow which we’ll use here, as well as pyTorch and scikit-learn, which you could also play around with using - we’d love to see examples on these to include in a community cookbook if you do make one!) A data management and analysis tool for managing and working with the data that you will use to train and evaluate your machine learning model. This includes spreadsheet programs like Microsoft Excel, specialised data analysis tools like Pandas or NumPy. ( In this case, as we are just using a tensorflow implementation, the model has been pre-trained for us - so we won’t need to do any management or analysis of data to create it) (FYI - check out the landscape section in our docs for a comparison on the compute landscape too!) A development environment or integrated development environment (IDE) for writing and managing your machine learning code. This could be a simple text editor like VS Code, or a more advanced IDE like PyCharm or Jupyter Notebook. (I’m going to use Google Colab here alongside VS Code editor, as unfortunately Google Colab does not support docker images - which we’ll need after testing our python code) AND finally…… A computing platform or cloud service for running your machine learning code and training your model. This could be your local machine, a dedicated server, or a cloud computing platform And this last one is where things get complicated - even if you are familiar with all the other items on this list…because: Machine learning models chew up A LOT of computing power and can take a very long time to run - and if you thought compiling a large code set or waiting for an ethereum transaction to be processed in a block was time consuming….. Well… machine learning model processing is what ping pong tables in the office were really made for.
  9. Using your local machine for small examples is possible - in fact I did manage to get this particular example working on my (very unhappy about it) Mac M1, however once you start doing bigger data processing, you are going to need more gas (eth analogy intended) and if you don’t have a dedicated server lying around the house, you’re going to need to use a virtual machine on a cloud computing platform and not only is that inefficient - due to the data being an unknown distance from the computation machine, but it can also get costly fast. Luckily though, these problems are some of the issues Bacalhau is trying to solve. Making data processing and computation open and available to everyone and speeding up the processing times is possible in Bacalhau, firstly - by using batch processing across multiple nodes and secondly by putting the processing nodes where the data lives.
  10. As I mentioned a bit earlier - Bacalhau is a decentralised computation network which provides a platform for public, transparent and optionally verifiable computation. It was originally conceived to bring useful compute resources to data stored on the IPFS & Filecoin network - bringing the same benefits of open collaboration on datasets stored in IPFS and Filecoin to generic compute tasks. I recommend this video by lead David Aronchick if you want to hear more too - check it out on the BacalhauProject youtube.
  11. And yes - for those of you that are following the Filecoin starmap - it will go hand-in-hand with the Filecoin Virtual Machine - Filecoin’s EVM-compatible layer one, as while FVM can offer programmable data on small amounts of state - like most on-chain computation, Bacalhau provides you with compute over that data or any data, and that includes big data, with support for GPUs - and in the not too distant future - you should even be able to leverage it by calling Bacalhau in your smart contracts - giving you the ability to interact directly with data stored on the filecoin blockchain - a big win for developer experience and users! If you’re interested in this keep an eye on Project Frog… a POC the team is working on now. https://pl-strflt.notion.site/Project-Frog-FVM-Stable-Diffusion-Demo-6cb6c2f5c5614394a5468a5253b6c812 (need QR)
  12. So, how does Bacalhau work? As I mentioned, Bacalhau is a peer-to-peer network of nodes that enable users to run Docker containers or Web Assembly images as tasks against data that is stored in IPFS (the interplanetary file system), providing a platform for public, transparent, and optionally verifiable computation - known as compute over data or COD for short - which fun fact - is where Bacalhau’s name comes from - as Bacalhau is Portugese for cod. ======= Bacalhau provides a platform for public, transparent, and optionally verifiable computation. It enables users to run arbitrary Docker containers and WebAssembly (wasm) images as tasks against data stored in the InterPlanetary File System (IPFS). This architecture is also referred to as Compute Over Data (or CoD). Bacalhau operates as a peer-to-peer network of nodes where each node has both a requestor and compute component. To interact with the cluster, Bacalhau CLI requests are sent to a node in the cluster (via JSON over HTTP), which then broadcasts messages over the transport layer to other nodes in the cluster. All other nodes in the network are connected to the transport layer and as such have a shared view of the world. Architecture Transport layer (interface) Requester node (component) Compute node (component) Executer (interface) Storage Provider (interface) Verifier (interface) Publisher (interface) Job Lifecycle Job Submission Job Acceptance Job Execution Verification Publishing Networking Input/Output volumes
  13. Each node in the Bacalhau network has both a requestor and compute component. To interact with the cluster, the Bacalhau CLI requests are sent to a node in the cluster (via JSON over HTTP), which then broadcasts messages over the transport layer to other nodes in the cluster. All other nodes in the network are connected to the transport layer and so have a shared view of the world. Architecture Transport layer (interface) Requester node (component) Compute node (component) Executer (interface) Storage Provider (interface) Verifier (interface) Publisher (interface)
  14. This means that when a job is submitted to Bacalhau it is forwarded to a Bacalhau cluster node which acts as the requestor node. This requestor node broadcasts the job to the other nodes in the peer-to-peer network who can bid on the the job - creating a job deal market. This job deal also has a concurrency flag - meaning you can set the number of nodes you want to perform this job concurrently. The job also includes a confidence property - which defines how many verification proposals must agree for the job to be deemed successful and a min-bid property which defines the number of bids that must have been made before choosing to accept any. Depending on the flags given to the requestor node (which can include concurrency, confidence, minimum-bids before acceptance, reputation, locality, cost, hardware resources and even volumes (such as IPFS CIDs), the requestor node accepts one or more matching job bids, and the accepted bids are then executed by the relevant compute nodes using the storage providers that executor node has mapped in - for example the docker executor and IPFS storage volumes. Once the job is complete, a verification will be generated which, if accepted, leads to the raw results folder being published to the compute node. (default is estuary). There is a lot more flexibility to this process but the main thing to understand is that Bacalhau gives you - the user, the ability to execute a job where the data is already hosted, across a decentralised network of servers that store data, enabling you to save time, money and operational overheads and also provides referenceable and reproducible jobs that are easy to manage and maintain. Phew, now that we understand what’s going on under the hood, let’s take a quick look at what stable diffusion is before we dive into the code here!
  15. Essentially, machine learning is a subset of AI focused on having computers provide insights into problems without explicitly programming them. There are three main types of machine learning—supervised learning, unsupervised learning, and reinforcement learning. Deep learning, which is the category stable diffusion falls under, is a subset of machine learning application that teaches itself to perform a specific task - in this case converting a text input to an image output.
  16. And Stable Diffusion is the particular model used currently for doing this text-to-image processing (and is the same model Dall-E uses). It is based on a diffusion probabilistic model that uses a transformer to generate images from text. In this example we’ll be using a pre-trained model in tensorflow - google’s open source machine learning library. Now, you don’t really need to worry about the ins-and-outs of how stable diffusion works, unless, like me, you’re curious, and if so - I encourage you to dig in further - there’s lot’s of resources around to explain it! All you really need to know here, though is that you can create your own text to image processor to run on the Bacalhau Network and you don’t need a data science degree or any special skills to do it, in fact im hoping you’ll be inspired to make your own models and projects with it! This example aims to show how easy it is to use stable diffusion on a GPU with the Bacalhau Network. So let’s get on to analysing this data! Show me the code!!
  17. Yay the coding part!
  18. I’ll be using Google Colab to go through this example. For those that may not have come across it before, Google Colaboratory allows you to write, execute and share computation files - like our python and bash scripts, and runs in your browser by executing the code on a private virtual machine which you can configure. It’s based on the open source Jupyter notebook which is used extensively in data science fields and stores any notebooks you make in your google drive or you can load from github. And it’s free tier works great for us here! By the way you can run any of the examples on the bacalhau docs in Google Colab too! [demo on setup]
  19. If you want to follow along with me here - go ahead and set up google colab for yourself. Alternatively you can just open the shared colab from the docs site without the need to install.
  20. This is the first screen you’ll see if you open the colab url. You can create a new notebook for this example there
  21. Since this example uses a GPU based environment, we’ll just switch out our runtime environment from the runtime menu to run on a GPU We don’t need a premium GPU for this one :)
  22. Alrighty - so awesome! You have a fresh notebook. Let’s get started! curl -sL https://get.bacalhau.org/install.sh | bash bacalhau version pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet pip install tensorflow tensorflow_addons ftfy --upgrade --quiet pip install tqdm apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2
  23. Alrighty - so awesome! You have a fresh notebook. Let’s get started! This is the first script. pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet pip install tensorflow tensorflow_addons ftfy --upgrade --quiet pip install tqdm apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2
  24. Yay the coding part!
  25. And i think these look better than patrick collins’ pizza ;P
  26. Insert an openlinks QR code here
  27. bachalhau Each section should contain: What is this project Why / how does it improve filecoin access / storage / usability etc. WHEN / Latest projects Where are the building gaps CALL Any call to Action (why does this involve you - may be as above)
  28. bachalhau Each section should contain: What is this project Why / how does it improve filecoin access / storage / usability etc. WHEN / Latest projects Where are the building gaps CALL Any call to Action (why does this involve you - may be as above)
  29. First we’ll download and take a look at just one of the IPFS files locally… (there is far more data than this in the overall collection though - this represents only about one chunk - or 100,000 blocks of eth data) This code simply gets the iPFS tar file through a http gateway and un-tars it (decompresses) wget -q -O file.tar.gz https://w3s.link/ipfs/bafybeifgqjvmzbtz427bne7af5tbndmvniabaex77us6l637gqtb2iwlwq tar -xvf file.tar.gz
  30. output_850000
  31. We’ll use pandas to create some columns for this data and plot it with matplot. Pandas is an open source data analysis tool built on top of the Python programming language. We’re using it here to clean up the ethereum data from the csv file found in our output directory. We can either run this directly from a python3 terminal instance, or by creating a script in our working directory and running that. import pandas as pd import glob import matplotlib.pyplot as plt file = glob.glob('output_*/transactions/start_block=*/end_block=*/transactions*.csv')[0] print("Loading file %s" % file) df = pd.read_csv(file) df['value'] = df['value'].astype('float') df['from_address'] = df['from_address'].astype('string') df['to_address'] = df['to_address'].astype('string') df['hash'] = df['hash'].astype('string') df['block_hash'] = df['block_hash'].astype('string') df['block_datetime'] = pd.to_datetime(df['block_timestamp'], unit='s') df.info() df[['block_datetime', 'value']].groupby(pd.Grouper(key='block_datetime', freq='1D')).sum().plot() plt.show()
  32. This is cool - but! the code here only inspects the daily trading volume of Ethereum for a single chunk (100,000 blocks) of data. We can do better - We can use the Bacalhau client to download the data from IPFS and then run the analysis on the data in the cloud. This means that we can analyse the entire Ethereum blockchain without having to download it locally.
  33. To run jobs on the Bacalhau network you need to package your code. In this example, the code is packaged as a Docker image. Don’t worry though, you don’t need to go off and learn Docker, this data has already been dockerised and uploaded as an image for you to use as you see fit! So, let’s instead develop the code that will perform the analysis. The code here is a simple script to parse the incoming data and produce a CSV file with the daily trading volume of Ethereum.
  34. Read slide then.. Let’s try it out!
  35. There’s no need to do this - as the image already exists on Docker - but in case you want to….!
  36. Bacalhau is a distributed computing platform that allows you to run jobs on a network of computers. It is designed to be easy to use and to run on a variety of hardware. To submit a job, you can use the Bacalhau CLI. The following command will run the container above on the IPFS data -- the long hash -- shown at the start of this notebook. Let's confirm that the results are as expected.
  37. Look at docs if you want to understand more on what this means If you’re familiar with docker, you’ll notice some of these commands have an overlap that perform the same function. Inspect: The docker run command used the outputs volume as a results folder so when we download them they will be stored in a folder within volumes/outputs. Let’s check this out in VS Code and see it happen in real time.
  38. We can re-plot these results to see if they are the same as we got locally And they are! But… we could do that locally!! Why bother using Bacalhau??? Well… what about the rest of the ethereum data?? We want a full picture not just a snapshot!
  39. We can run the same analysis on the entire Ethereum blockchain (up to the point where I have uploaded the Ethereum data). To do this, we need to run the analysis on each of the chunks of data that we have stored on IPFS. We can do this by running the same job on each of the chunks. Let’s see this in action in VS code so we can see what’s going on with the files… We’ll need to wait for all our jobs to complete (bacalhau list -n 50) Then we’ll download all the results and merge them into a single directory.
  40. We’ll need to wait for all our jobs to complete (bacalhau list -n 50) Then we’ll download all the results and merge them into a single directory.
  41. So let’s get on to analysing this data! Show me the code!!
  42. So let’s get on to analysing this data! Show me the code!!
  43. So let’s get on to analysing this data! Show me the code!!
  44. So let’s get on to analysing this data! Show me the code!!
  45. Skip ahead if you just want me to show you the code!
  46. Let’s get to the example! https://bit.ly/bacalhaueth Here I’m going to run through this ethereum data analysis with bacalhau. You don’t need any prior knowledge of bacalhau or data science to join in with me here either!
  47. Installing bacalhau
  48. Installing bacalhau