Bacalhau: Stable Diffusion on a GPU

•Download as PPTX, PDF•

1 like•104 views

This document discusses running AI image generation models like Stable Diffusion on the Bacalhau decentralized computing platform. It begins with a demonstration of generating images from text using a Stable Diffusion model on Google Colab. It then provides an overview of Bacalhau and how it can be used to run models more efficiently by leveraging distributed computing resources. The document concludes by explaining how the Stable Diffusion model code was adapted to run on Bacalhau and generate images in a decentralized manner.

Software

AI-generated NFTs on FVM
with Bacalhau Stable Diffusion
Bacalhau
Ally Haire
Developer Relations
Engineer
DeveloperAlly

Building a Text to Image
Model (Stable Diffusion - GPU)
Bacalhau
Ally Haire
Developer Relations
Engineer
DeveloperAlly

Agenda aka the timestamps…
● Let’s see what we’re building in
action!
● Bacal… what? The why’s and how’s
of Bacalhau
● Brief intro to Machine Learning and
Stable Diffusion (text->image model)
● Show me the code! Coding up a text-
to-image script
● Running on Bacalhau

Stable Diffusion on GPU
with Bacalhau
Making Open Source Dall-E!

The example…
https://docs.bacalhau.org/examples/model-inference/stable-diffusion-gpu/

The example in action:
Running our Open Source Dall-E…
All anyone needs to do to run this example at any time is install
Bacalhau (one line of code) and run this docker image!!!
The Docker Image saved on Bacalhau Registry
The python script
on Docker to run
Output folder
to save to
The text input flag in python script
Bac.
CLI Docker command

Why Bacalhau?
A Primer on Building Machine Learning Models

How do we make this?
That one time… when I
asked ChatGPT what I
needed to get started
with ML models….
FYI: ChatGPT is a large
language model trained
with reinforcement
learning
Ahh the paradox of
asking ML for ML help ;)

Can I do this locally or on cloud though?
You could run this example locally - in fact I did manage to get it running on my
Mac M1 with a few code changes, however, computing the image from text took
a good 20 minutes or more
You could try and run this in the cloud with more computing power - though I
wasn’t able to find anywhere that provided a free-tier GPU model to use for it
(and I wasn’t willing to pay to try something!).

Bacalhau Architecture
The decentralised computation network

Bacal.. - what ??
Bacalhau is a network of open
compute resources available to
serve any data processing
workload
- It’s simple to use (you don’t
need an AI degree!)
- Requires minimal operational
overhead or setup
- It’s decentralised-first (or
edge-first) principled
- Aims to provide efficient
distributed computation with
batched tasks
Learn more about Bacalhau!
@BacalhauProject
https://youtu.be/RZopDyTJ1pk

Bacalhau & FVM?
FVM - programmable data on small
amounts of state
Bacalhau - Computation over this
or any data including big data and
support for GPUs
Future: Bacalhau + FVM - calling
bacalhau in your smart contracts!

Bacalhau Platform Architecture
Bacalhau provides a platform for
public, transparent, and optionally
verifiable computation.
It enables users to run arbitrary
Docker containers and WebAssembly
(wasm) images as tasks against data
stored in the InterPlanetary File
System (IPFS)
It operates as a peer-to-peer network
of nodes where each node has both a
requestor and compute component

Bacalhau System Components
● Requester node
(component)
● Compute node
(component)
● Transport layer
(interface)
● Executer (interface)
● Storage Provider
(interface)
● Verifier (interface)
● Publisher (interface)

Bacalhau Job Lifecycle
Job Submission Job Acceptance Job Execution
Job Verification Job Publishing
Job Submission Job Acceptance Job Execution
Job Verification Job Publishing

Err… Stable Diffusion?
No, you don’t need to be a data scientist!

AI and Machine Learning Quick Intro
Artificial Intelligence is an umbrella term for a few different concepts.
AI is any technique that allows computers to bring meaning to data in
similar ways to a human.
Machine learning is a subset of AI application that learns by itself and
has 3 main types:
• Supervised learning
• Regression & Classification Algorithms
• Unsupervised learning
• Clustering & Association Algorithms
• Reinforcement learning
• Value, Policy or Model Based reinforcement methods
Deep learning is a subset of machine learning application that teaches
itself to perform a specific task.

Stable what now…?
Generically, stable diffusion is what happens when you put a couple of drops of
dye into a bucket of water. Given time, the dye randomly disperses and eventually
settles into a uniform distribution which colours all the water evenly
In computer science, you define rules for your (dye) particles to follow and the
medium this takes place in.
In the example we’re doing, Stable Diffusion is a machine learning model used for
text-to-image processing (like Dall-E) and based on a diffusion probabilistic model
that uses a transformer
to generate images from
text.

Building the Scripts
Stable Diffusion on GPU with Google Colab

Tools & Environment
For this example we’ll need
- Google Colab (for testing our scripts)
- https://colab.research.google.com/
- Optional: Docker (if you want to deploy your own docker image)
Run any of our docs
examples in Google
Colab!

Get & Start
a Colab
You’ll need to add
Colab from the
Google Marketplace if
you want to create
your own notebooks
(you can run our
docs examples
without this though!)

Create a new notebook
Go to
https://colab.research.
google.com/

Google Colab setup
We’re
running on a
GPU for this
example, so
we want to
set our (VM)
runtime
environment
to GPU

Show me the code
Install some of our python dependencies
Fork of a keras/tensorflow implementation of Stable Diffusion:
The text-to-image library
Drivers for NVIDIA GPUs
Lib for progress bars
Tensorflow library & add-ons, unicode fixer

text2image.py
This is the basic text to
image script. It uses a
keras/tensorflow
implementation fork and
then generates the images
from a given text string
and finally displays the
image generated.
The ML weights are pre-
calculated in the library

We can do better…
stable-diffusion.py
This script adds input
parameters to our
text2image script and
saves the output images
to a file

Docker build
Dockerfile
Build & Push Docker Image

Build with Bacalhau
Stable Diffusion on GPU

Join the discussion:
- Twitter @BacalhauProject
- YouTube @bacalhauproject
- Slack #bacalhau @filecoinproject
- Github @bacalhau.org
- Forum github.com/filecoin-project
/bacalhau/discussions
See more examples:
- docs.bacalhau.org
Get Involved in the future of data!

Alan Kay - Computer Scientist
“The best way to predict
the future is to create it”

There are many great tools for training machine learning tools, ranging from sci-kit to Apache Spark, and tensorflow. However many of these systems largely leave open the question how to use our models outside of the batch world (like in a reactive application). Different options exist for persisting the results and using them for live training, and we will explore the trade-offs of the different formats and their corresponding serving/prediction layers.

Productionizing Machine Learning - Bigdata meetup 5-06-2019

Iulian Pintoiu

Introduction to Google Colaboratory.pdf

Yomna Mahmoud Ibrahim Hassan

How volkswagen used microservices and automation to develop self service solu...

Marcos Entenza Garcia

Migraine Drupal - syncing your staging and live sites

drupalindia

Deploying deep learning models with Docker and Kubernetes

PetteriTeikariPhD

Reproducibility in artificial intelligence

Carlos Toxtli

Metaflow: The ML Infrastructure at Netflix

Bill Liu

Metaflow was started at Netflix to answer a pressing business need: How to enable an organization of data scientists, who are not software engineers by training, build and deploy end-to-end machine learning workflows and applications independently. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning. Today, the open-source Metaflow powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics to real estate. In this talk, you will learn about: - What to expect from a modern ML infrastructure stack. - Using Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix. - Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies. https://www.aicamp.ai/event/eventdetails/W2021080510

Hear hear dev & ops alike - ever got bitten by the fragmentation of the Cloud space at deployment time, By AWS vs Azure, Open Shift vs Heroku ? in a word, ever dreamt of configuring at once your Cloud application along with both its VMs and database ? Well, the extensible Open Cloud Computing Interface (OCCI) REST API (see http://occi-wg.org/) allows just that, by addressing the whole XaaS spectrum. And now, OCCI is getting powerboosted by Eclipse Modeling and formal foundations. Enter Cloud Designer and other outputs of the OCCIware project (See http://www.occiware.org) : multiple visual representations, one per Cloud layer and technology. XaaS Cloud extension model validation, documentation & ops scripting generation. Simulation, decision-making comparison. Connectors that bring those models to life by getting their status from common Cloud services. Runtime middleware, deployed, monitored, adminstrated. And tackling the very interesting challenge of modeling a meta API in EMF's metamodel, while staying true to EMF, Eclipse tools and the OCCI standard. Featuring Eclipse Sirius, Acceleo generators, EMF at runtime. Coming soon to a new Eclipse Foundation project near you, if so you'd like. This talk includes a demonstration of the Docker connector and of how to use Cloud Designer to configure a simple Cloud application's deployment on the Roboconf PaaS system and OpenStack infrastructure.

Tensorflow 2.0 and Coral Edge TPU

Andrés Leonardo Martinez Ortiz

Introduction to the new Tensorflow 2.x and the Coral AI Edge TPU hardware. The presentation introduces Tensorflow main features such as Sequential and Functional APIs, mobile support with Tensorflow Lite, web support with TensorflowJS and Google Cloud support with TFX. In addition, the presentation introduces the new edge TPU architecture coming from Coral AI, including its main hardware features and description of the compiling flow.

From Zero to Hero - All you need to do serious deep learning stuff in R

Kai Lichtenberg

Google Cloud Platform for Data Science teams

Barton Rhodes

Sparklife - Life In The Trenches With Spark

Ian Pointer

Docker containers & the Future of Drupal testing

Ricardo Amaro

Common primitives in Docker environments

alexandru giurgiu

Docker 101

Kevin Nord

Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs

Indrajit Poddar

Why scala for data science

Guglielmo Iozzia

Google cloud Study Jam 2023.pptx

GDSCNiT

Strata CA 2019: From Jupyter to Production Manu Mukerji

Manu Mukerji

Proposed title From Jupyter to production Description of the presentation Jupyter is very popular for data science, data exploration and visualization, this talk is about how to use it in for AI/ML in a production environment. General Flow of talk: How things can go wrong with QA, Production releases when using a notebook Common Jupyter ML examples Standard ML flow Training in production Model creation Testing in production Papermill and Jupyter Production workflows with Sagemaker Speaker Manu Mukerji is senior director of data, machine learning, and analytics at 8×8. Manu’s background lies in cloud computing and big data, working on systems handling billions of transactions per day in real time. He enjoys building and architecting scalable, highly available data solutions and has extensive experience working in online advertising and social media.

Care and feeding notes

Perrin Harkins

How to lock a Python in a cage? Managing Python environment inside an R project

WLOG Solutions

Presentation from a workshop delivered by Piotr Chaberski during PyData Warsaw Meetup on Feb. 06, 2018. Imagine that you are developing a project using R and your big corporate customer, after weeks of processing requests to establish open-source analytical environment, finally managed to install R on their production machines. Now you realized, that it would be nice to use some Python library in your solution... How would you tell the client to switch to Python for a while?

Dictionary Within the Cloud

gueste4978b94

A DevOps guide to Kubernetes

Paul Czarkowski

Orion Context Broker introduction 20240604

Fermin Galan

E-commerce Application Development Company.pdf

Hornet Dynamics

Similar to Bacalhau: Stable Diffusion on a GPU

Lunch and learn as3_frameworks

Yuri Visser

Extending DevOps to Big Data Applications with Kubernetes

Nicola Ferraro

Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...

Jason Dai

EclipseCon 2016 - OCCIware : one Cloud API to rule them all

Marc Dutoo

OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide

OCCIware

Tensorflow 2.0 and Coral Edge TPU

Andrés Leonardo Martinez Ortiz

From Zero to Hero - All you need to do serious deep learning stuff in R

Kai Lichtenberg

Google Cloud Platform for Data Science teams

Barton Rhodes

Sparklife - Life In The Trenches With Spark

Ian Pointer

Docker containers & the Future of Drupal testing

Ricardo Amaro

Common primitives in Docker environments

alexandru giurgiu

Docker 101

Kevin Nord

Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs

Indrajit Poddar

Why scala for data science

Guglielmo Iozzia

Google cloud Study Jam 2023.pptx

GDSCNiT

Strata CA 2019: From Jupyter to Production Manu Mukerji

Manu Mukerji

Care and feeding notes

Perrin Harkins

How to lock a Python in a cage? Managing Python environment inside an R project

WLOG Solutions

Dictionary Within the Cloud

gueste4978b94

A DevOps guide to Kubernetes

Paul Czarkowski

Similar to Bacalhau: Stable Diffusion on a GPU (20)

Lunch and learn as3_frameworks

Extending DevOps to Big Data Applications with Kubernetes

Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...

EclipseCon 2016 - OCCIware : one Cloud API to rule them all

OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide

Tensorflow 2.0 and Coral Edge TPU

From Zero to Hero - All you need to do serious deep learning stuff in R

Google Cloud Platform for Data Science teams

Sparklife - Life In The Trenches With Spark

Docker containers & the Future of Drupal testing

Common primitives in Docker environments

Docker 101

Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs

Why scala for data science

Google cloud Study Jam 2023.pptx

Strata CA 2019: From Jupyter to Production Manu Mukerji

Care and feeding notes

How to lock a Python in a cage? Managing Python environment inside an R project

Dictionary Within the Cloud

A DevOps guide to Kubernetes

Recently uploaded

Orion Context Broker introduction 20240604

Fermin Galan

E-commerce Application Development Company.pdf

Hornet Dynamics

Atelier - Innover avec l’IA Générative et les graphes de connaissances

Neo4j

Atelier - Innover avec l’IA Générative et les graphes de connaissances Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement. Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.

Large Language Models and the End of Programming

Matt Welsh

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Globus

JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.

2024 eCommerceDays Toulouse - Sylius 2.0.pdf

Łukasz Chruściel

LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM

lorraineandreiamcidl

May Marketo Masterclass, London MUG May 22 2024.pdf

Adele Miller

Vitthal Shirke Java Microservices Resume.pdf

Vitthal Shirke

Pro Unity Game Development with C-sharp Book

abdulrafaychaudhry

APIs for Browser Automation (MoT Meetup 2024)

Boni García

Globus Compute wth IRI Workflows - GlobusWorld 2024

Globus

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

Lecture 1 Introduction to games development

abdulrafaychaudhry

Globus Compute Introduction - GlobusWorld 2024

Globus

A Sighting of filterA in Typelevel Rite of Passage

Philip Schwarz

Enhancing Research Orchestration Capabilities at ORNL.pdf

Globus

Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...

Crescat

Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry. Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events. With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use. Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements. If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io

Globus Connect Server Deep Dive - GlobusWorld 2024

Globus

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx

ShamsuddeenMuhammadA

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Globus

The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.

Recently uploaded (20)

Orion Context Broker introduction 20240604

E-commerce Application Development Company.pdf

Atelier - Innover avec l’IA Générative et les graphes de connaissances

Large Language Models and the End of Programming

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

2024 eCommerceDays Toulouse - Sylius 2.0.pdf

LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM

May Marketo Masterclass, London MUG May 22 2024.pdf

Vitthal Shirke Java Microservices Resume.pdf

Pro Unity Game Development with C-sharp Book

APIs for Browser Automation (MoT Meetup 2024)

Globus Compute wth IRI Workflows - GlobusWorld 2024

Lecture 1 Introduction to games development

Globus Compute Introduction - GlobusWorld 2024

A Sighting of filterA in Typelevel Rite of Passage

Enhancing Research Orchestration Capabilities at ORNL.pdf

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...

Globus Connect Server Deep Dive - GlobusWorld 2024

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Bacalhau: Stable Diffusion on a GPU

1. AI-generated NFTs on FVM with Bacalhau Stable Diffusion Bacalhau Ally Haire Developer Relations Engineer DeveloperAlly

2. Building a Text to Image Model (Stable Diffusion - GPU) Bacalhau Ally Haire Developer Relations Engineer DeveloperAlly

3. Agenda aka the timestamps… ● Let’s see what we’re building in action! ● Bacal… what? The why’s and how’s of Bacalhau ● Brief intro to Machine Learning and Stable Diffusion (text->image model) ● Show me the code! Coding up a text- to-image script ● Running on Bacalhau

4. Stable Diffusion on GPU with Bacalhau Making Open Source Dall-E!

5. The example… https://docs.bacalhau.org/examples/model-inference/stable-diffusion-gpu/

6. The example in action: Running our Open Source Dall-E… All anyone needs to do to run this example at any time is install Bacalhau (one line of code) and run this docker image!!! The Docker Image saved on Bacalhau Registry The python script on Docker to run Output folder to save to The text input flag in python script Bac. CLI Docker command

7. So many cool cod (pun intended) …

8. Why Bacalhau? A Primer on Building Machine Learning Models

9. How do we make this? That one time… when I asked ChatGPT what I needed to get started with ML models…. FYI: ChatGPT is a large language model trained with reinforcement learning Ahh the paradox of asking ML for ML help ;)

10. Can I do this locally or on cloud though? You could run this example locally - in fact I did manage to get it running on my Mac M1 with a few code changes, however, computing the image from text took a good 20 minutes or more You could try and run this in the cloud with more computing power - though I wasn’t able to find anywhere that provided a free-tier GPU model to use for it (and I wasn’t willing to pay to try something!).

11. Bacalhau Architecture The decentralised computation network

12. Bacal.. - what ?? Bacalhau is a network of open compute resources available to serve any data processing workload - It’s simple to use (you don’t need an AI degree!) - Requires minimal operational overhead or setup - It’s decentralised-first (or edge-first) principled - Aims to provide efficient distributed computation with batched tasks Learn more about Bacalhau! @BacalhauProject https://youtu.be/RZopDyTJ1pk

13. Bacalhau & FVM? FVM - programmable data on small amounts of state Bacalhau - Computation over this or any data including big data and support for GPUs Future: Bacalhau + FVM - calling bacalhau in your smart contracts!

14. Bacalhau Platform Architecture Bacalhau provides a platform for public, transparent, and optionally verifiable computation. It enables users to run arbitrary Docker containers and WebAssembly (wasm) images as tasks against data stored in the InterPlanetary File System (IPFS) It operates as a peer-to-peer network of nodes where each node has both a requestor and compute component

15. Bacalhau System Components ● Requester node (component) ● Compute node (component) ● Transport layer (interface) ● Executer (interface) ● Storage Provider (interface) ● Verifier (interface) ● Publisher (interface)

16. Bacalhau Job Lifecycle Job Submission Job Acceptance Job Execution Job Verification Job Publishing Job Submission Job Acceptance Job Execution Job Verification Job Publishing

17. Err… Stable Diffusion? No, you don’t need to be a data scientist!

18. AI and Machine Learning Quick Intro Artificial Intelligence is an umbrella term for a few different concepts. AI is any technique that allows computers to bring meaning to data in similar ways to a human. Machine learning is a subset of AI application that learns by itself and has 3 main types: • Supervised learning • Regression & Classification Algorithms • Unsupervised learning • Clustering & Association Algorithms • Reinforcement learning • Value, Policy or Model Based reinforcement methods Deep learning is a subset of machine learning application that teaches itself to perform a specific task.

19. Stable what now…? Generically, stable diffusion is what happens when you put a couple of drops of dye into a bucket of water. Given time, the dye randomly disperses and eventually settles into a uniform distribution which colours all the water evenly In computer science, you define rules for your (dye) particles to follow and the medium this takes place in. In the example we’re doing, Stable Diffusion is a machine learning model used for text-to-image processing (like Dall-E) and based on a diffusion probabilistic model that uses a transformer to generate images from text.

20. Building the Scripts Stable Diffusion on GPU with Google Colab

21. Tools & Environment For this example we’ll need - Google Colab (for testing our scripts) - https://colab.research.google.com/ - Optional: Docker (if you want to deploy your own docker image) Run any of our docs examples in Google Colab!

22. Get & Start a Colab You’ll need to add Colab from the Google Marketplace if you want to create your own notebooks (you can run our docs examples without this though!)

23. Create a new notebook Go to https://colab.research. google.com/

24. Google Colab setup We’re running on a GPU for this example, so we want to set our (VM) runtime environment to GPU

25. Show me the code Install some of our python dependencies Fork of a keras/tensorflow implementation of Stable Diffusion: The text-to-image library Drivers for NVIDIA GPUs Lib for progress bars Tensorflow library & add-ons, unicode fixer

26. text2image.py This is the basic text to image script. It uses a keras/tensorflow implementation fork and then generates the images from a given text string and finally displays the image generated. The ML weights are pre- calculated in the library

27. We can do better… stable-diffusion.py This script adds input parameters to our text2image script and saves the output images to a file

28. Docker build Dockerfile Build & Push Docker Image

29. Build with Bacalhau Stable Diffusion on GPU

30. Run on Bacalhau

31. Pizza for everyone!

32. Join the discussion: - Twitter @BacalhauProject - YouTube @bacalhauproject - Slack #bacalhau @filecoinproject - Github @bacalhau.org - Forum github.com/filecoin-project /bacalhau/discussions See more examples: - docs.bacalhau.org Get Involved in the future of data!

33. Alan Kay - Computer Scientist “The best way to predict the future is to create it”

34. Computable The Future of Filecoin is

35. You. The Future of Filecoin is

Editor's Notes

G’day Dev’s and Fil-ders!! I’m Ally and I’m a Developer Relations Engineer working with the FIlecoin Foundation and Protocol Labs. And today I wanted to introduce you to a project we are very excited about and that will hopefully help democratise the future of data processing - Bacalhau. I’m going to be showing you a really cool example of how to build your own text to image code and then run it on Bacalhau, which for those that haven’t heard of it, is not just a portuguese fish, but a peer-to-peer open computation network!
G’day Dev’s and Fil-ders!! I’m Ally and I’m a Developer Relations Engineer working with the FIlecoin Foundation and Protocol Labs. And today I wanted to introduce you to a project we are very excited about and that will hopefully help democratise the future of data processing - Bacalhau. I’m going to be showing you a really cool example of how to build your own text to image code and then run it on Bacalhau, which for those that haven’t heard of it, is not just a portuguese fish, but a peer-to-peer open computation network!
Warning from https://docs.oakhost.net/tutorials/tensorflow-apple-silicon/ Caveat on running the first example - this may not work for your machine - which is one of the exact reasons we have bacalhau for large data!! Options are… (Collab notebook, paid cloud environment for testing)
So here’s the timestamps for those that want to go directly to what they’re interested in also…. First we’ll see this fully built model in action on bacalhau, then I’ll chat a little bit about what Bacalhau is, how it works and what advantages it can offer you. I’ll then give a brief breakdown on what a Stable Diffusion model is and how it fits into the Machine Learning world And then I’ll move on to walking through how you can create this example end-to-end and run it on the Bacalhau network.
You can find the example I’m going through today in the bacalahau docs - along with a host of other awesome examples you can try out for yourself. In this video we’re going to be building, testing and running machine learning code with a machine learning model called stable diffusion (more on that later), which will take any text you provide it and transform that text into a funky and original image. Pretty cool right! I was excited to do this video and see how it works.
So, before we get started, let’s take a sneak peak of what our final example looks like for all the visual learner’s out there :) And to clarify before we start - you don’t need any prior knowledge of Bacalhau or data science or any special developer environment or hardware to join in with me here either! (install bacalhau then run the docker image on bacalhau in Google Colab - explain it later though) We will be using python, though i’ll walk you through everything the code does, so if you have any sort of coding background, you’ll be more than capable of doing this yourself! The main point here is “If you can write Python, Go, Javascript, R in any language and want to use ANY type of data, then Bacalhau is for you.” And…. even if you don’t…. If you can open the terminal on your computer and copy paste 2 lines of code into it - one to install Bacalhau and the other to run this example that’s already been ‘uploaded’ to the network…. (image should be loaded by now)... well then you can go ahead and use this on your machine not even worry about the rest of this tutorial! Though I hope you stick around to learn how to make more fun images like this one and perhaps gets some inspiration for how you’d go about building your own data projects on Bacalhau (and show off to us!!)
Building and testing machine learning models can be a tricky business and this is mostly because of the compute power you need to train and run them. Like most development, you need a few things to get started … If you’ve never built one before - this is the section for you!
- like knowing a programming language for writing and running your machine learning code and setting up a good developer environment for the task you’re looking to do. In fact when I asked fellow ML model ChatGPT what I needed to get started with machine learning .. it told me i needed A programming language for writing and running your machine learning code (Ok I know some python - check!) A machine learning framework or library that provides pre-built algorithms, tools, and other resources for building and training machine learning models. (Technically it’s not ESSENTIAL - you could build your own model implementation, but hey that’s hard and would be like building your own sort function, and luckily that’s not necessary - there are several open source libraries out there like Tensorflow which we’ll use here, as well as pyTorch and scikit-learn, which you could also play around with using - we’d love to see examples on these to include in a community cookbook if you do make one!) A data management and analysis tool for managing and working with the data that you will use to train and evaluate your machine learning model. This includes spreadsheet programs like Microsoft Excel, specialised data analysis tools like Pandas or NumPy. ( In this case, as we are just using a tensorflow implementation, the model has been pre-trained for us - so we won’t need to do any management or analysis of data to create it) (FYI - check out the landscape section in our docs for a comparison on the compute landscape too!) A development environment or integrated development environment (IDE) for writing and managing your machine learning code. This could be a simple text editor like VS Code, or a more advanced IDE like PyCharm or Jupyter Notebook. (I’m going to use Google Colab here alongside VS Code editor, as unfortunately Google Colab does not support docker images - which we’ll need after testing our python code) AND finally…… A computing platform or cloud service for running your machine learning code and training your model. This could be your local machine, a dedicated server, or a cloud computing platform And this last one is where things get complicated - even if you are familiar with all the other items on this list…because: Machine learning models chew up A LOT of computing power and can take a very long time to run - and if you thought compiling a large code set or waiting for an ethereum transaction to be processed in a block was time consuming….. Well… machine learning model processing is what ping pong tables in the office were really made for.
Using your local machine for small examples is possible - in fact I did manage to get this particular example working on my (very unhappy about it) Mac M1, however once you start doing bigger data processing, you are going to need more gas (eth analogy intended) and if you don’t have a dedicated server lying around the house, you’re going to need to use a virtual machine on a cloud computing platform and not only is that inefficient - due to the data being an unknown distance from the computation machine, but it can also get costly fast. Luckily though, these problems are some of the issues Bacalhau is trying to solve. Making data processing and computation open and available to everyone and speeding up the processing times is possible in Bacalhau, firstly - by using batch processing across multiple nodes and secondly by putting the processing nodes where the data lives.
As I mentioned a bit earlier - Bacalhau is a decentralised computation network which provides a platform for public, transparent and optionally verifiable computation. It was originally conceived to bring useful compute resources to data stored on the IPFS & Filecoin network - bringing the same benefits of open collaboration on datasets stored in IPFS and Filecoin to generic compute tasks. I recommend this video by lead David Aronchick if you want to hear more too - check it out on the BacalhauProject youtube.
And yes - for those of you that are following the Filecoin starmap - it will go hand-in-hand with the Filecoin Virtual Machine - Filecoin’s EVM-compatible layer one, as while FVM can offer programmable data on small amounts of state - like most on-chain computation, Bacalhau provides you with compute over that data or any data, and that includes big data, with support for GPUs - and in the not too distant future - you should even be able to leverage it by calling Bacalhau in your smart contracts - giving you the ability to interact directly with data stored on the filecoin blockchain - a big win for developer experience and users! If you’re interested in this keep an eye on Project Frog… a POC the team is working on now. https://pl-strflt.notion.site/Project-Frog-FVM-Stable-Diffusion-Demo-6cb6c2f5c5614394a5468a5253b6c812 (need QR)
So, how does Bacalhau work? As I mentioned, Bacalhau is a peer-to-peer network of nodes that enable users to run Docker containers or Web Assembly images as tasks against data that is stored in IPFS (the interplanetary file system), providing a platform for public, transparent, and optionally verifiable computation - known as compute over data or COD for short - which fun fact - is where Bacalhau’s name comes from - as Bacalhau is Portugese for cod. ======= Bacalhau provides a platform for public, transparent, and optionally verifiable computation. It enables users to run arbitrary Docker containers and WebAssembly (wasm) images as tasks against data stored in the InterPlanetary File System (IPFS). This architecture is also referred to as Compute Over Data (or CoD). Bacalhau operates as a peer-to-peer network of nodes where each node has both a requestor and compute component. To interact with the cluster, Bacalhau CLI requests are sent to a node in the cluster (via JSON over HTTP), which then broadcasts messages over the transport layer to other nodes in the cluster. All other nodes in the network are connected to the transport layer and as such have a shared view of the world. Architecture Transport layer (interface) Requester node (component) Compute node (component) Executer (interface) Storage Provider (interface) Verifier (interface) Publisher (interface) Job Lifecycle Job Submission Job Acceptance Job Execution Verification Publishing Networking Input/Output volumes
Each node in the Bacalhau network has both a requestor and compute component. To interact with the cluster, the Bacalhau CLI requests are sent to a node in the cluster (via JSON over HTTP), which then broadcasts messages over the transport layer to other nodes in the cluster. All other nodes in the network are connected to the transport layer and so have a shared view of the world. Architecture Transport layer (interface) Requester node (component) Compute node (component) Executer (interface) Storage Provider (interface) Verifier (interface) Publisher (interface)
This means that when a job is submitted to Bacalhau it is forwarded to a Bacalhau cluster node which acts as the requestor node. This requestor node broadcasts the job to the other nodes in the peer-to-peer network who can bid on the the job - creating a job deal market. This job deal also has a concurrency flag - meaning you can set the number of nodes you want to perform this job concurrently. The job also includes a confidence property - which defines how many verification proposals must agree for the job to be deemed successful and a min-bid property which defines the number of bids that must have been made before choosing to accept any. Depending on the flags given to the requestor node (which can include concurrency, confidence, minimum-bids before acceptance, reputation, locality, cost, hardware resources and even volumes (such as IPFS CIDs), the requestor node accepts one or more matching job bids, and the accepted bids are then executed by the relevant compute nodes using the storage providers that executor node has mapped in - for example the docker executor and IPFS storage volumes. Once the job is complete, a verification will be generated which, if accepted, leads to the raw results folder being published to the compute node. (default is estuary).There is a lot more flexibility to this process but the main thing to understand is that Bacalhau gives you - the user, the ability to execute a job where the data is already hosted, across a decentralised network of servers that store data, enabling you to save time, money and operational overheads and also provides referenceable and reproducible jobs that are easy to manage and maintain. Phew, now that we understand what’s going on under the hood, let’s take a quick look at what stable diffusion is before we dive into the code here!
Essentially, machine learning is a subset of AI focused on having computers provide insights into problems without explicitly programming them. There are three main types of machine learning—supervised learning, unsupervised learning, and reinforcement learning. Deep learning, which is the category stable diffusion falls under, is a subset of machine learning application that teaches itself to perform a specific task - in this case converting a text input to an image output.
And Stable Diffusion is the particular model used currently for doing this text-to-image processing (and is the same model Dall-E uses). It is based on a diffusion probabilistic model that uses a transformer to generate images from text. In this example we’ll be using a pre-trained model in tensorflow - google’s open source machine learning library. Now, you don’t really need to worry about the ins-and-outs of how stable diffusion works, unless, like me, you’re curious, and if so - I encourage you to dig in further - there’s lot’s of resources around to explain it! All you really need to know here, though is that you can create your own text to image processor to run on the Bacalhau Network and you don’t need a data science degree or any special skills to do it, in fact im hoping you’ll be inspired to make your own models and projects with it! This example aims to show how easy it is to use stable diffusion on a GPU with the Bacalhau Network. So let’s get on to analysing this data! Show me the code!!
Yay the coding part!
I’ll be using Google Colab to go through this example. For those that may not have come across it before, Google Colaboratory allows you to write, execute and share computation files - like our python and bash scripts, and runs in your browser by executing the code on a private virtual machine which you can configure. It’s based on the open source Jupyter notebook which is used extensively in data science fields and stores any notebooks you make in your google drive or you can load from github. And it’s free tier works great for us here! By the way you can run any of the examples on the bacalhau docs in Google Colab too! [demo on setup]
If you want to follow along with me here - go ahead and set up google colab for yourself. Alternatively you can just open the shared colab from the docs site without the need to install.
This is the first screen you’ll see if you open the colab url. You can create a new notebook for this example there
Since this example uses a GPU based environment, we’ll just switch out our runtime environment from the runtime menu to run on a GPU We don’t need a premium GPU for this one :)
Alrighty - so awesome! You have a fresh notebook. Let’s get started! curl -sL https://get.bacalhau.org/install.sh | bash bacalhau version pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet pip install tensorflow tensorflow_addons ftfy --upgrade --quiet pip install tqdm apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2
Alrighty - so awesome! You have a fresh notebook. Let’s get started! This is the first script. pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet pip install tensorflow tensorflow_addons ftfy --upgrade --quiet pip install tqdm apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2
Yay the coding part!
And i think these look better than patrick collins’ pizza ;P
Insert an openlinks QR code here
bachalhauEach section should contain: What is this project Why / how does it improve filecoin access / storage / usability etc. WHEN / Latest projects Where are the building gaps CALL Any call to Action (why does this involve you - may be as above)
bachalhauEach section should contain: What is this project Why / how does it improve filecoin access / storage / usability etc. WHEN / Latest projects Where are the building gaps CALL Any call to Action (why does this involve you - may be as above)
First we’ll download and take a look at just one of the IPFS files locally… (there is far more data than this in the overall collection though - this represents only about one chunk - or 100,000 blocks of eth data) This code simply gets the iPFS tar file through a http gateway and un-tars it (decompresses) wget -q -O file.tar.gz https://w3s.link/ipfs/bafybeifgqjvmzbtz427bne7af5tbndmvniabaex77us6l637gqtb2iwlwq tar -xvf file.tar.gz
output_850000
We’ll use pandas to create some columns for this data and plot it with matplot. Pandas is an open source data analysis tool built on top of the Python programming language. We’re using it here to clean up the ethereum data from the csv file found in our output directory. We can either run this directly from a python3 terminal instance, or by creating a script in our working directory and running that. import pandas as pd import glob import matplotlib.pyplot as plt file = glob.glob('output_*/transactions/start_block=*/end_block=*/transactions*.csv')[0] print("Loading file %s" % file) df = pd.read_csv(file) df['value'] = df['value'].astype('float') df['from_address'] = df['from_address'].astype('string') df['to_address'] = df['to_address'].astype('string') df['hash'] = df['hash'].astype('string') df['block_hash'] = df['block_hash'].astype('string') df['block_datetime'] = pd.to_datetime(df['block_timestamp'], unit='s') df.info() df[['block_datetime', 'value']].groupby(pd.Grouper(key='block_datetime', freq='1D')).sum().plot() plt.show()
This is cool - but! the code here only inspects the daily trading volume of Ethereum for a single chunk (100,000 blocks) of data. We can do better - We can use the Bacalhau client to download the data from IPFS and then run the analysis on the data in the cloud. This means that we can analyse the entire Ethereum blockchain without having to download it locally.
To run jobs on the Bacalhau network you need to package your code. In this example, the code is packaged as a Docker image. Don’t worry though, you don’t need to go off and learn Docker, this data has already been dockerised and uploaded as an image for you to use as you see fit! So, let’s instead develop the code that will perform the analysis. The code here is a simple script to parse the incoming data and produce a CSV file with the daily trading volume of Ethereum.
Read slide then.. Let’s try it out!
There’s no need to do this - as the image already exists on Docker - but in case you want to….!
Bacalhau is a distributed computing platform that allows you to run jobs on a network of computers. It is designed to be easy to use and to run on a variety of hardware. To submit a job, you can use the Bacalhau CLI. The following command will run the container above on the IPFS data -- the long hash -- shown at the start of this notebook. Let's confirm that the results are as expected.
Look at docs if you want to understand more on what this means If you’re familiar with docker, you’ll notice some of these commands have an overlap that perform the same function. Inspect: The docker run command used the outputs volume as a results folder so when we download them they will be stored in a folder within volumes/outputs. Let’s check this out in VS Code and see it happen in real time.
We can re-plot these results to see if they are the same as we got locally And they are! But… we could do that locally!! Why bother using Bacalhau??? Well… what about the rest of the ethereum data?? We want a full picture not just a snapshot!
We can run the same analysis on the entire Ethereum blockchain (up to the point where I have uploaded the Ethereum data). To do this, we need to run the analysis on each of the chunks of data that we have stored on IPFS. We can do this by running the same job on each of the chunks. Let’s see this in action in VS code so we can see what’s going on with the files… We’ll need to wait for all our jobs to complete (bacalhau list -n 50) Then we’ll download all the results and merge them into a single directory.
We’ll need to wait for all our jobs to complete (bacalhau list -n 50) Then we’ll download all the results and merge them into a single directory.
So let’s get on to analysing this data! Show me the code!!
So let’s get on to analysing this data! Show me the code!!
So let’s get on to analysing this data! Show me the code!!
So let’s get on to analysing this data! Show me the code!!
Skip ahead if you just want me to show you the code!
Let’s get to the example! https://bit.ly/bacalhaueth Here I’m going to run through this ethereum data analysis with bacalhau. You don’t need any prior knowledge of bacalhau or data science to join in with me here either!
Installing bacalhau
Installing bacalhau

Bacalhau: Stable Diffusion on a GPU

Recommended

Recommended

More Related Content

Similar to Bacalhau: Stable Diffusion on a GPU

Similar to Bacalhau: Stable Diffusion on a GPU (20)

Recently uploaded

Recently uploaded (20)

Bacalhau: Stable Diffusion on a GPU

Editor's Notes