This document discusses running AI image generation models like Stable Diffusion on the Bacalhau decentralized computing platform. It begins with a demonstration of generating images from text using a Stable Diffusion model on Google Colab. It then provides an overview of Bacalhau and how it can be used to run models more efficiently by leveraging distributed computing resources. The document concludes by explaining how the Stable Diffusion model code was adapted to run on Bacalhau and generate images in a decentralized manner.
Intro - End to end ML with Kubeflow @ SignalConf 2018Holden Karau
There are many great tools for training machine learning tools, ranging from sci-kit to Apache Spark, and tensorflow. However many of these systems largely leave open the question how to use our models outside of the batch world (like in a reactive application). Different options exist for persisting the results and using them for live training, and we will explore the trade-offs of the different formats and their corresponding serving/prediction layers.
An Introduction to Google Colab as a development Platform using Python as a programming language. Mentioning Tips that are beneficial to make the development experience much easier.
How volkswagen used microservices and automation to develop self service solu...Marcos Entenza Garcia
Discovery session presented during Red Hat Summit 2019 about creating self-service solutions using Ansible Tower and OpenShift to onboard developers into the platform during Red Hat Open Innovation Labs for VW in Germany
Deploying deep learning models with Docker and KubernetesPetteriTeikariPhD
Short introduction for platform agnostic production deployment with some medical examples.
Alternative download: https://www.dropbox.com/s/qlml5k5h113trat/deep_cloudArchitecture.pdf?dl=0
Reproducibility in artificial intelligenceCarlos Toxtli
In this presentation, we explore how artificial intelligence experiments can be reproduced by implementing three different approaches such as: Reproducibility frameworks, Reproducible benchmarking tools, and Reproducible standalone methods.
Metaflow: The ML Infrastructure at NetflixBill Liu
Metaflow was started at Netflix to answer a pressing business need: How to enable an organization of data scientists, who are not software engineers by training, build and deploy end-to-end machine learning workflows and applications independently. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.
Today, the open-source Metaflow powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics to real estate.
In this talk, you will learn about:
- What to expect from a modern ML infrastructure stack.
- Using Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix.
- Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.
https://www.aicamp.ai/event/eventdetails/W2021080510
Intro - End to end ML with Kubeflow @ SignalConf 2018Holden Karau
There are many great tools for training machine learning tools, ranging from sci-kit to Apache Spark, and tensorflow. However many of these systems largely leave open the question how to use our models outside of the batch world (like in a reactive application). Different options exist for persisting the results and using them for live training, and we will explore the trade-offs of the different formats and their corresponding serving/prediction layers.
An Introduction to Google Colab as a development Platform using Python as a programming language. Mentioning Tips that are beneficial to make the development experience much easier.
How volkswagen used microservices and automation to develop self service solu...Marcos Entenza Garcia
Discovery session presented during Red Hat Summit 2019 about creating self-service solutions using Ansible Tower and OpenShift to onboard developers into the platform during Red Hat Open Innovation Labs for VW in Germany
Deploying deep learning models with Docker and KubernetesPetteriTeikariPhD
Short introduction for platform agnostic production deployment with some medical examples.
Alternative download: https://www.dropbox.com/s/qlml5k5h113trat/deep_cloudArchitecture.pdf?dl=0
Reproducibility in artificial intelligenceCarlos Toxtli
In this presentation, we explore how artificial intelligence experiments can be reproduced by implementing three different approaches such as: Reproducibility frameworks, Reproducible benchmarking tools, and Reproducible standalone methods.
Metaflow: The ML Infrastructure at NetflixBill Liu
Metaflow was started at Netflix to answer a pressing business need: How to enable an organization of data scientists, who are not software engineers by training, build and deploy end-to-end machine learning workflows and applications independently. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.
Today, the open-source Metaflow powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics to real estate.
In this talk, you will learn about:
- What to expect from a modern ML infrastructure stack.
- Using Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix.
- Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.
https://www.aicamp.ai/event/eventdetails/W2021080510
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
DevOps, continuous delivery and modern architectural trends can incredibly speed up the software development process. Big Data applications cannot be an exception and need to keep the same pace.
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware
Hear hear dev & ops alike - ever got bitten by the fragmentation of the Cloud space at deployment time, By AWS vs Azure, Open Shift vs Heroku ? in a word, ever dreamt of configuring at once your Cloud application along with both its VMs and database ? Well, the extensible Open Cloud Computing Interface (OCCI) REST API (see http://occi-wg.org/) allows just that, by addressing the whole XaaS spectrum.
And now, OCCI is getting powerboosted by Eclipse Modeling and formal foundations. Enter Cloud Designer and other outputs of the OCCIware project (See http://www.occiware.org) : multiple visual representations, one per Cloud layer and technology. XaaS Cloud extension model validation, documentation & ops scripting generation. Simulation, decision-making comparison. Connectors that bring those models to life by getting their status from common Cloud services. Runtime middleware, deployed, monitored, adminstrated. And tackling the very interesting challenge of modeling a meta API in EMF's metamodel, while staying true to EMF, Eclipse tools and the OCCI standard.
Featuring Eclipse Sirius, Acceleo generators, EMF at runtime. Coming soon to a new Eclipse Foundation project near you, if so you'd like.
This talk includes a demonstration of the Docker connector and of how to use Cloud Designer to configure a simple Cloud application's deployment on the Roboconf PaaS system and OpenStack infrastructure.
Introduction to the new Tensorflow 2.x and the Coral AI Edge TPU hardware. The presentation introduces Tensorflow main features such as Sequential and Functional APIs, mobile support with Tensorflow Lite, web support with TensorflowJS and Google Cloud support with TFX.
In addition, the presentation introduces the new edge TPU architecture coming from Coral AI, including its main hardware features and description of the compiling flow.
From Zero to Hero - All you need to do serious deep learning stuff in R Kai Lichtenberg
Slides from my talk at the useR Group Münster 04/17/18 on how to start with GPU enabled deep learning in R. First I'm showing how to create a NVIDIA docker based image with RStudio, TensorFlow and Keras for R and then comes an introduction to deep learning (classic MNIST classification with MLP and CNN).
Docker containers & the Future of Drupal testing Ricardo Amaro
Story of an investigation to improve cloud
The sad VirtualMachine story
Containers and non-containers
DEMO - Drupal Docker
Drupal Testbots story in a Glance
Docker as a testing automation factor
DEMO - Docker Tesbot
Integration path
Rough presentation about the aspects and problems that you have to deal with when you build a system based on Docker. Presentation done for the Docker Amsterdam meetup.
The PPT contains the following content:
1. What is Google Cloud Study Jam
2. What is Cloud Computing
3. Fundamentals of cloud computing
4. what is Generative AI
5. Fundamentals of Generative AI
6. Breif overview on Google Cloud Study Jam.
7. Networking Session.
Strata CA 2019: From Jupyter to Production Manu MukerjiManu Mukerji
Proposed title
From Jupyter to production
Description of the presentation
Jupyter is very popular for data science, data exploration and visualization, this talk is about how to use it in for AI/ML in a production environment.
General Flow of talk:
How things can go wrong with QA, Production releases when using a notebook
Common Jupyter ML examples
Standard ML flow
Training in production
Model creation
Testing in production
Papermill and Jupyter
Production workflows with Sagemaker
Speaker
Manu Mukerji is senior director of data, machine learning, and analytics at 8×8. Manu’s background lies in cloud computing and big data, working on systems handling billions of transactions per day in real time. He enjoys building and architecting scalable, highly available data solutions and has extensive experience working in online advertising and social media.
How to lock a Python in a cage? Managing Python environment inside an R projectWLOG Solutions
Presentation from a workshop delivered by Piotr Chaberski during PyData Warsaw Meetup on Feb. 06, 2018.
Imagine that you are developing a project using R and your big corporate customer, after weeks of processing requests to establish open-source analytical environment, finally managed to install R on their production machines. Now you realized, that it would be nice to use some Python library in your solution...
How would you tell the client to switch to Python for a while?
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
DevOps, continuous delivery and modern architectural trends can incredibly speed up the software development process. Big Data applications cannot be an exception and need to keep the same pace.
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware
Hear hear dev & ops alike - ever got bitten by the fragmentation of the Cloud space at deployment time, By AWS vs Azure, Open Shift vs Heroku ? in a word, ever dreamt of configuring at once your Cloud application along with both its VMs and database ? Well, the extensible Open Cloud Computing Interface (OCCI) REST API (see http://occi-wg.org/) allows just that, by addressing the whole XaaS spectrum.
And now, OCCI is getting powerboosted by Eclipse Modeling and formal foundations. Enter Cloud Designer and other outputs of the OCCIware project (See http://www.occiware.org) : multiple visual representations, one per Cloud layer and technology. XaaS Cloud extension model validation, documentation & ops scripting generation. Simulation, decision-making comparison. Connectors that bring those models to life by getting their status from common Cloud services. Runtime middleware, deployed, monitored, adminstrated. And tackling the very interesting challenge of modeling a meta API in EMF's metamodel, while staying true to EMF, Eclipse tools and the OCCI standard.
Featuring Eclipse Sirius, Acceleo generators, EMF at runtime. Coming soon to a new Eclipse Foundation project near you, if so you'd like.
This talk includes a demonstration of the Docker connector and of how to use Cloud Designer to configure a simple Cloud application's deployment on the Roboconf PaaS system and OpenStack infrastructure.
Introduction to the new Tensorflow 2.x and the Coral AI Edge TPU hardware. The presentation introduces Tensorflow main features such as Sequential and Functional APIs, mobile support with Tensorflow Lite, web support with TensorflowJS and Google Cloud support with TFX.
In addition, the presentation introduces the new edge TPU architecture coming from Coral AI, including its main hardware features and description of the compiling flow.
From Zero to Hero - All you need to do serious deep learning stuff in R Kai Lichtenberg
Slides from my talk at the useR Group Münster 04/17/18 on how to start with GPU enabled deep learning in R. First I'm showing how to create a NVIDIA docker based image with RStudio, TensorFlow and Keras for R and then comes an introduction to deep learning (classic MNIST classification with MLP and CNN).
Docker containers & the Future of Drupal testing Ricardo Amaro
Story of an investigation to improve cloud
The sad VirtualMachine story
Containers and non-containers
DEMO - Drupal Docker
Drupal Testbots story in a Glance
Docker as a testing automation factor
DEMO - Docker Tesbot
Integration path
Rough presentation about the aspects and problems that you have to deal with when you build a system based on Docker. Presentation done for the Docker Amsterdam meetup.
The PPT contains the following content:
1. What is Google Cloud Study Jam
2. What is Cloud Computing
3. Fundamentals of cloud computing
4. what is Generative AI
5. Fundamentals of Generative AI
6. Breif overview on Google Cloud Study Jam.
7. Networking Session.
Strata CA 2019: From Jupyter to Production Manu MukerjiManu Mukerji
Proposed title
From Jupyter to production
Description of the presentation
Jupyter is very popular for data science, data exploration and visualization, this talk is about how to use it in for AI/ML in a production environment.
General Flow of talk:
How things can go wrong with QA, Production releases when using a notebook
Common Jupyter ML examples
Standard ML flow
Training in production
Model creation
Testing in production
Papermill and Jupyter
Production workflows with Sagemaker
Speaker
Manu Mukerji is senior director of data, machine learning, and analytics at 8×8. Manu’s background lies in cloud computing and big data, working on systems handling billions of transactions per day in real time. He enjoys building and architecting scalable, highly available data solutions and has extensive experience working in online advertising and social media.
How to lock a Python in a cage? Managing Python environment inside an R projectWLOG Solutions
Presentation from a workshop delivered by Piotr Chaberski during PyData Warsaw Meetup on Feb. 06, 2018.
Imagine that you are developing a project using R and your big corporate customer, after weeks of processing requests to establish open-source analytical environment, finally managed to install R on their production machines. Now you realized, that it would be nice to use some Python library in your solution...
How would you tell the client to switch to Python for a while?
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Bacalhau: Stable Diffusion on a GPU
1. AI-generated NFTs on FVM
with Bacalhau Stable Diffusion
Bacalhau
Ally Haire
Developer Relations
Engineer
DeveloperAlly
2. Building a Text to Image
Model (Stable Diffusion - GPU)
Bacalhau
Ally Haire
Developer Relations
Engineer
DeveloperAlly
3. Agenda aka the timestamps…
● Let’s see what we’re building in
action!
● Bacal… what? The why’s and how’s
of Bacalhau
● Brief intro to Machine Learning and
Stable Diffusion (text->image model)
● Show me the code! Coding up a text-
to-image script
● Running on Bacalhau
6. The example in action:
Running our Open Source Dall-E…
All anyone needs to do to run this example at any time is install
Bacalhau (one line of code) and run this docker image!!!
The Docker Image saved on Bacalhau Registry
The python script
on Docker to run
Output folder
to save to
The text input flag in python script
Bac.
CLI Docker command
9. How do we make this?
That one time… when I
asked ChatGPT what I
needed to get started
with ML models….
FYI: ChatGPT is a large
language model trained
with reinforcement
learning
Ahh the paradox of
asking ML for ML help ;)
10. Can I do this locally or on cloud though?
You could run this example locally - in fact I did manage to get it running on my
Mac M1 with a few code changes, however, computing the image from text took
a good 20 minutes or more
You could try and run this in the cloud with more computing power - though I
wasn’t able to find anywhere that provided a free-tier GPU model to use for it
(and I wasn’t willing to pay to try something!).
12. Bacal.. - what ??
Bacalhau is a network of open
compute resources available to
serve any data processing
workload
- It’s simple to use (you don’t
need an AI degree!)
- Requires minimal operational
overhead or setup
- It’s decentralised-first (or
edge-first) principled
- Aims to provide efficient
distributed computation with
batched tasks
Learn more about Bacalhau!
@BacalhauProject
https://youtu.be/RZopDyTJ1pk
13. Bacalhau & FVM?
FVM - programmable data on small
amounts of state
Bacalhau - Computation over this
or any data including big data and
support for GPUs
Future: Bacalhau + FVM - calling
bacalhau in your smart contracts!
14. Bacalhau Platform Architecture
Bacalhau provides a platform for
public, transparent, and optionally
verifiable computation.
It enables users to run arbitrary
Docker containers and WebAssembly
(wasm) images as tasks against data
stored in the InterPlanetary File
System (IPFS)
It operates as a peer-to-peer network
of nodes where each node has both a
requestor and compute component
18. AI and Machine Learning Quick Intro
Artificial Intelligence is an umbrella term for a few different concepts.
AI is any technique that allows computers to bring meaning to data in
similar ways to a human.
Machine learning is a subset of AI application that learns by itself and
has 3 main types:
• Supervised learning
• Regression & Classification Algorithms
• Unsupervised learning
• Clustering & Association Algorithms
• Reinforcement learning
• Value, Policy or Model Based reinforcement methods
Deep learning is a subset of machine learning application that teaches
itself to perform a specific task.
19. Stable what now…?
Generically, stable diffusion is what happens when you put a couple of drops of
dye into a bucket of water. Given time, the dye randomly disperses and eventually
settles into a uniform distribution which colours all the water evenly
In computer science, you define rules for your (dye) particles to follow and the
medium this takes place in.
In the example we’re doing, Stable Diffusion is a machine learning model used for
text-to-image processing (like Dall-E) and based on a diffusion probabilistic model
that uses a transformer
to generate images from
text.
21. Tools & Environment
For this example we’ll need
- Google Colab (for testing our scripts)
- https://colab.research.google.com/
- Optional: Docker (if you want to deploy your own docker image)
Run any of our docs
examples in Google
Colab!
22. Get & Start
a Colab
You’ll need to add
Colab from the
Google Marketplace if
you want to create
your own notebooks
(you can run our
docs examples
without this though!)
23. Create a new notebook
Go to
https://colab.research.
google.com/
25. Show me the code
Install some of our python dependencies
Fork of a keras/tensorflow implementation of Stable Diffusion:
The text-to-image library
Drivers for NVIDIA GPUs
Lib for progress bars
Tensorflow library & add-ons, unicode fixer
26. text2image.py
This is the basic text to
image script. It uses a
keras/tensorflow
implementation fork and
then generates the images
from a given text string
and finally displays the
image generated.
The ML weights are pre-
calculated in the library
27. We can do better…
stable-diffusion.py
This script adds input
parameters to our
text2image script and
saves the output images
to a file
32. Join the discussion:
- Twitter @BacalhauProject
- YouTube @bacalhauproject
- Slack #bacalhau @filecoinproject
- Github @bacalhau.org
- Forum github.com/filecoin-project
/bacalhau/discussions
See more examples:
- docs.bacalhau.org
Get Involved in the future of data!
33. Alan Kay - Computer Scientist
“The best way to predict
the future is to create it”
G’day Dev’s and Fil-ders!!
I’m Ally and I’m a Developer Relations Engineer working with the FIlecoin Foundation and Protocol Labs.
And today I wanted to introduce you to a project we are very excited about and that will hopefully help democratise the future of data processing - Bacalhau.
I’m going to be showing you a really cool example of how to build your own text to image code and then run it on Bacalhau, which for those that haven’t heard of it, is not just a portuguese fish, but a peer-to-peer open computation network!
G’day Dev’s and Fil-ders!!
I’m Ally and I’m a Developer Relations Engineer working with the FIlecoin Foundation and Protocol Labs.
And today I wanted to introduce you to a project we are very excited about and that will hopefully help democratise the future of data processing - Bacalhau.
I’m going to be showing you a really cool example of how to build your own text to image code and then run it on Bacalhau, which for those that haven’t heard of it, is not just a portuguese fish, but a peer-to-peer open computation network!
Warning from https://docs.oakhost.net/tutorials/tensorflow-apple-silicon/
Caveat on running the first example - this may not work for your machine - which is one of the exact reasons we have bacalhau for large data!! Options are… (Collab notebook, paid cloud environment for testing)
So here’s the timestamps for those that want to go directly to what they’re interested in also….
First we’ll see this fully built model in action on bacalhau, then I’ll chat a little bit about what Bacalhau is, how it works and what advantages it can offer you.
I’ll then give a brief breakdown on what a Stable Diffusion model is and how it fits into the Machine Learning world
And then I’ll move on to walking through how you can create this example end-to-end and run it on the Bacalhau network.
You can find the example I’m going through today in the bacalahau docs - along with a host of other awesome examples you can try out for yourself.
In this video we’re going to be building, testing and running machine learning code with a machine learning model called stable diffusion (more on that later), which will take any text you provide it and transform that text into a funky and original image. Pretty cool right! I was excited to do this video and see how it works.
So, before we get started, let’s take a sneak peak of what our final example looks like for all the visual learner’s out there :)
And to clarify before we start - you don’t need any prior knowledge of Bacalhau or data science or any special developer environment or hardware to join in with me here either!
(install bacalhau then run the docker image on bacalhau in Google Colab - explain it later though)
We will be using python, though i’ll walk you through everything the code does, so if you have any sort of coding background, you’ll be more than capable of doing this yourself!
The main point here is “If you can write Python, Go, Javascript, R in any language and want to use ANY type of data, then Bacalhau is for you.”
And…. even if you don’t…. If you can open the terminal on your computer and copy paste 2 lines of code into it - one to install Bacalhau and the other to run this example that’s already been ‘uploaded’ to the network…. (image should be loaded by now)... well then you can go ahead and use this on your machine not even worry about the rest of this tutorial! Though I hope you stick around to learn how to make more fun images like this one and perhaps gets some inspiration for how you’d go about building your own data projects on Bacalhau (and show off to us!!)
Building and testing machine learning models can be a tricky business and this is mostly because of the compute power you need to train and run them.
Like most development, you need a few things to get started …
If you’ve never built one before - this is the section for you!
- like knowing a programming language for writing and running your machine learning code and setting up a good developer environment for the task you’re looking to do.
In fact when I asked fellow ML model ChatGPT what I needed to get started with machine learning .. it told me i needed
A programming language for writing and running your machine learning code (Ok I know some python - check!)
A machine learning framework or library that provides pre-built algorithms, tools, and other resources for building and training machine learning models. (Technically it’s not ESSENTIAL - you could build your own model implementation, but hey that’s hard and would be like building your own sort function, and luckily that’s not necessary - there are several open source libraries out there like Tensorflow which we’ll use here, as well as pyTorch and scikit-learn, which you could also play around with using - we’d love to see examples on these to include in a community cookbook if you do make one!)
A data management and analysis tool for managing and working with the data that you will use to train and evaluate your machine learning model. This includes spreadsheet programs like Microsoft Excel, specialised data analysis tools like Pandas or NumPy. ( In this case, as we are just using a tensorflow implementation, the model has been pre-trained for us - so we won’t need to do any management or analysis of data to create it)
(FYI - check out the landscape section in our docs for a comparison on the compute landscape too!)
A development environment or integrated development environment (IDE) for writing and managing your machine learning code. This could be a simple text editor like VS Code, or a more advanced IDE like PyCharm or Jupyter Notebook. (I’m going to use Google Colab here alongside VS Code editor, as unfortunately Google Colab does not support docker images - which we’ll need after testing our python code)
AND finally……
A computing platform or cloud service for running your machine learning code and training your model. This could be your local machine, a dedicated server, or a cloud computing platform
And this last one is where things get complicated - even if you are familiar with all the other items on this list…because: Machine learning models chew up A LOT of computing power and can take a very long time to run - and if you thought compiling a large code set or waiting for an ethereum transaction to be processed in a block was time consuming….. Well… machine learning model processing is what ping pong tables in the office were really made for.
Using your local machine for small examples is possible - in fact I did manage to get this particular example working on my (very unhappy about it) Mac M1, however once you start doing bigger data processing, you are going to need more gas (eth analogy intended) and if you don’t have a dedicated server lying around the house, you’re going to need to use a virtual machine on a cloud computing platform and not only is that inefficient - due to the data being an unknown distance from the computation machine, but it can also get costly fast.
Luckily though, these problems are some of the issues Bacalhau is trying to solve. Making data processing and computation open and available to everyone and speeding up the processing times is possible in Bacalhau, firstly - by using batch processing across multiple nodes and secondly by putting the processing nodes where the data lives.
As I mentioned a bit earlier - Bacalhau is a decentralised computation network which provides a platform for public, transparent and optionally verifiable computation.
It was originally conceived to bring useful compute resources to data stored on the IPFS & Filecoin network - bringing the same benefits of open collaboration on datasets stored in IPFS and Filecoin to generic compute tasks.
I recommend this video by lead David Aronchick if you want to hear more too - check it out on the BacalhauProject youtube.
And yes - for those of you that are following the Filecoin starmap - it will go hand-in-hand with the Filecoin Virtual Machine - Filecoin’s EVM-compatible layer one, as while FVM can offer programmable data on small amounts of state - like most on-chain computation, Bacalhau provides you with compute over that data or any data, and that includes big data, with support for GPUs - and in the not too distant future - you should even be able to leverage it by calling Bacalhau in your smart contracts - giving you the ability to interact directly with data stored on the filecoin blockchain - a big win for developer experience and users! If you’re interested in this keep an eye on Project Frog… a POC the team is working on now.
https://pl-strflt.notion.site/Project-Frog-FVM-Stable-Diffusion-Demo-6cb6c2f5c5614394a5468a5253b6c812 (need QR)
So, how does Bacalhau work?
As I mentioned, Bacalhau is a peer-to-peer network of nodes that enable users to run Docker containers or Web Assembly images as tasks against data that is stored in IPFS (the interplanetary file system), providing a platform for public, transparent, and optionally verifiable computation - known as compute over data or COD for short - which fun fact - is where Bacalhau’s name comes from - as Bacalhau is Portugese for cod.
=======
Bacalhau provides a platform for public, transparent, and optionally verifiable computation. It enables users to run arbitrary Docker containers and WebAssembly (wasm) images as tasks against data stored in the InterPlanetary File System (IPFS). This architecture is also referred to as Compute Over Data (or CoD).
Bacalhau operates as a peer-to-peer network of nodes where each node has both a requestor and compute component. To interact with the cluster, Bacalhau CLI requests are sent to a node in the cluster (via JSON over HTTP), which then broadcasts messages over the transport layer to other nodes in the cluster. All other nodes in the network are connected to the transport layer and as such have a shared view of the world.
Architecture
Transport layer (interface)
Requester node (component)
Compute node (component)
Executer (interface)
Storage Provider (interface)
Verifier (interface)
Publisher (interface)
Job Lifecycle
Job Submission
Job Acceptance
Job Execution
Verification
Publishing
Networking
Input/Output volumes
Each node in the Bacalhau network has both a requestor and compute component. To interact with the cluster, the Bacalhau CLI requests are sent to a node in the cluster (via JSON over HTTP), which then broadcasts messages over the transport layer to other nodes in the cluster. All other nodes in the network are connected to the transport layer and so have a shared view of the world.
Architecture
Transport layer (interface)
Requester node (component)
Compute node (component)
Executer (interface)
Storage Provider (interface)
Verifier (interface)
Publisher (interface)
This means that when a job is submitted to Bacalhau it is forwarded to a Bacalhau cluster node which acts as the requestor node.
This requestor node broadcasts the job to the other nodes in the peer-to-peer network who can bid on the the job - creating a job deal market.
This job deal also has a concurrency flag - meaning you can set the number of nodes you want to perform this job concurrently. The job also includes a confidence property - which defines how many verification proposals must agree for the job to be deemed successful and a min-bid property which defines the number of bids that must have been made before choosing to accept any.
Depending on the flags given to the requestor node (which can include concurrency, confidence, minimum-bids before acceptance, reputation, locality, cost, hardware resources and even volumes (such as IPFS CIDs), the requestor node accepts one or more matching job bids, and the accepted bids are then executed by the relevant compute nodes using the storage providers that executor node has mapped in - for example the docker executor and IPFS storage volumes.
Once the job is complete, a verification will be generated which, if accepted, leads to the raw results folder being published to the compute node. (default is estuary).There is a lot more flexibility to this process but the main thing to understand is that Bacalhau gives you - the user, the ability to execute a job where the data is already hosted, across a decentralised network of servers that store data, enabling you to save time, money and operational overheads and also provides referenceable and reproducible jobs that are easy to manage and maintain.
Phew, now that we understand what’s going on under the hood, let’s take a quick look at what stable diffusion is before we dive into the code here!
Essentially, machine learning is a subset of AI focused on having computers provide insights into problems without explicitly programming them.
There are three main types of machine learning—supervised learning, unsupervised learning, and reinforcement learning.
Deep learning, which is the category stable diffusion falls under, is a subset of machine learning application that teaches itself to perform a specific task - in this case converting a text input to an image output.
And Stable Diffusion is the particular model used currently for doing this text-to-image processing (and is the same model Dall-E uses).
It is based on a diffusion probabilistic model that uses a transformer to generate images from text. In this example we’ll be using a pre-trained model in tensorflow - google’s open source machine learning library.
Now, you don’t really need to worry about the ins-and-outs of how stable diffusion works, unless, like me, you’re curious, and if so - I encourage you to dig in further - there’s lot’s of resources around to explain it!
All you really need to know here, though is that you can create your own text to image processor to run on the Bacalhau Network and you don’t need a data science degree or any special skills to do it, in fact im hoping you’ll be inspired to make your own models and projects with it!
This example aims to show how easy it is to use stable diffusion on a GPU with the Bacalhau Network.
So let’s get on to analysing this data! Show me the code!!
Yay the coding part!
I’ll be using Google Colab to go through this example. For those that may not have come across it before, Google Colaboratory allows you to write, execute and share computation files - like our python and bash scripts, and runs in your browser by executing the code on a private virtual machine which you can configure. It’s based on the open source Jupyter notebook which is used extensively in data science fields and stores any notebooks you make in your google drive or you can load from github. And it’s free tier works great for us here!
By the way you can run any of the examples on the bacalhau docs in Google Colab too!
[demo on setup]
If you want to follow along with me here - go ahead and set up google colab for yourself.
Alternatively you can just open the shared colab from the docs site without the need to install.
This is the first screen you’ll see if you open the colab url.
You can create a new notebook for this example there
Since this example uses a GPU based environment, we’ll just switch out our runtime environment from the runtime menu to run on a GPU
We don’t need a premium GPU for this one :)
Alrighty - so awesome! You have a fresh notebook. Let’s get started! curl -sL https://get.bacalhau.org/install.sh | bash
bacalhau version
pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet
pip install tensorflow tensorflow_addons ftfy --upgrade --quiet
pip install tqdm
apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2
Alrighty - so awesome! You have a fresh notebook. Let’s get started!
This is the first script.
pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet
pip install tensorflow tensorflow_addons ftfy --upgrade --quiet
pip install tqdm
apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2
Yay the coding part!
And i think these look better than patrick collins’ pizza ;P
Insert an openlinks QR code here
bachalhauEach section should contain:
What is this project
Why / how does it improve filecoin access / storage / usability etc.
WHEN / Latest projects
Where are the building gaps
CALL Any call to Action (why does this involve you - may be as above)
bachalhauEach section should contain:
What is this project
Why / how does it improve filecoin access / storage / usability etc.
WHEN / Latest projects
Where are the building gaps
CALL Any call to Action (why does this involve you - may be as above)
First we’ll download and take a look at just one of the IPFS files locally… (there is far more data than this in the overall collection though - this represents only about one chunk - or 100,000 blocks of eth data)
This code simply gets the iPFS tar file through a http gateway and un-tars it (decompresses)
wget -q -O file.tar.gz https://w3s.link/ipfs/bafybeifgqjvmzbtz427bne7af5tbndmvniabaex77us6l637gqtb2iwlwq
tar -xvf file.tar.gz
output_850000
We’ll use pandas to create some columns for this data and plot it with matplot.
Pandas is an open source data analysis tool built on top of the Python programming language.
We’re using it here to clean up the ethereum data from the csv file found in our output directory.
We can either run this directly from a python3 terminal instance, or by creating a script in our working directory and running that.
import pandas as pd
import glob
import matplotlib.pyplot as plt
file = glob.glob('output_*/transactions/start_block=*/end_block=*/transactions*.csv')[0]
print("Loading file %s" % file)
df = pd.read_csv(file)
df['value'] = df['value'].astype('float')
df['from_address'] = df['from_address'].astype('string')
df['to_address'] = df['to_address'].astype('string')
df['hash'] = df['hash'].astype('string')
df['block_hash'] = df['block_hash'].astype('string')
df['block_datetime'] = pd.to_datetime(df['block_timestamp'], unit='s')
df.info()
df[['block_datetime', 'value']].groupby(pd.Grouper(key='block_datetime', freq='1D')).sum().plot()
plt.show()
This is cool - but! the code here only inspects the daily trading volume of Ethereum for a single chunk (100,000 blocks) of data.
We can do better - We can use the Bacalhau client to download the data from IPFS and then run the analysis on the data in the cloud. This means that we can analyse the entire Ethereum blockchain without having to download it locally.
To run jobs on the Bacalhau network you need to package your code. In this example, the code is packaged as a Docker image. Don’t worry though, you don’t need to go off and learn Docker, this data has already been dockerised and uploaded as an image for you to use as you see fit!
So, let’s instead develop the code that will perform the analysis. The code here is a simple script to parse the incoming data and produce a CSV file with the daily trading volume of Ethereum.
Read slide then..
Let’s try it out!
There’s no need to do this - as the image already exists on Docker - but in case you want to….!
Bacalhau is a distributed computing platform that allows you to run jobs on a network of computers. It is designed to be easy to use and to run on a variety of hardware.
To submit a job, you can use the Bacalhau CLI.
The following command will run the container above on the IPFS data -- the long hash -- shown at the start of this notebook. Let's confirm that the results are as expected.
Look at docs if you want to understand more on what this means
If you’re familiar with docker, you’ll notice some of these commands have an overlap that perform the same function.
Inspect:
The docker run command used the outputs volume as a results folder so when we download them they will be stored in a folder within volumes/outputs.
Let’s check this out in VS Code and see it happen in real time.
We can re-plot these results to see if they are the same as we got locally
And they are!
But… we could do that locally!! Why bother using Bacalhau???
Well… what about the rest of the ethereum data?? We want a full picture not just a snapshot!
We can run the same analysis on the entire Ethereum blockchain (up to the point where I have uploaded the Ethereum data). To do this, we need to run the analysis on each of the chunks of data that we have stored on IPFS. We can do this by running the same job on each of the chunks.
Let’s see this in action in VS code so we can see what’s going on with the files…
We’ll need to wait for all our jobs to complete (bacalhau list -n 50)
Then we’ll download all the results and merge them into a single directory.
We’ll need to wait for all our jobs to complete (bacalhau list -n 50)
Then we’ll download all the results and merge them into a single directory.
So let’s get on to analysing this data! Show me the code!!
So let’s get on to analysing this data! Show me the code!!
So let’s get on to analysing this data! Show me the code!!
So let’s get on to analysing this data! Show me the code!!
Skip ahead if you just want me to show you the code!
Let’s get to the example! https://bit.ly/bacalhaueth
Here I’m going to run through this ethereum data analysis with bacalhau.
You don’t need any prior knowledge of bacalhau or data science to join in with me here either!