SlideShare a Scribd company logo
Samy Fodil
Founder & CEO of Taubyte
WebAssembly
is Key to Better
LLM
Performance
Samy Fodil
Founder & CEO of Taubyte
Taubyte
An Open Source Cloud
Platform on Autopilot, where
coding in local environment
equals scaling to global
production. 🚀
Local Coding
Global Production
Taubyte Locally
Taubyte In Prod
https://tau.how
LLMs
Generative AI
INPUT
Large Language Models
OUTPUT
Proprietary
Open Source
✅Train
✅Host
Large Language Models
LLM Training
LLM Inference
📦Huge PyTorch Images
🔗Complex Dependencies
🔒Lock GPU ressources
🗿Primitive Orchestration
Problems
🪽Lightweight
🪣Sand-boxed
⚡Easy to orchestrate
LLM Inference
WASM to the rescue
🪽 Lightweight
<2MB ~4GB
🪽 Lightweight
⚡Fast to provision
🌱Cheap to cache
Closer to Data (Data Gravity) &
User (Edge Computing)
🪣Sand-boxed
🛡️Secure
🕹️Interfaces
Which means No, Restricted, Virtual or
Mocked: Networking, Filesystem and more
LLM Inference
You can use transformers
directly from python. This
will result in PyTorch and
other dependencies.
📦Huge PyTorch Images
🔗Complex Dependencies
🔒Lock GPU ressources
LLM Inference
You can for example use onnx, llama.cpp, candle (which
wrapps llama.cpp) or TensorRT-LLM and have a lower foot
print. But it comes with a few challenges:
🔒Lock GPU ressources
🧠Way harder to implement
Scaling LLM Inference (lvl1)
🎹Orchestration
🌱Caching
Inference Engines for LLM
🤹Load balancing?
🚦Routing?
With WASM
If we made LLM and AI, in general, available through a set of
common host calls we can combine benfits of
Inference Engines
🎹Orchestration
🌱Caching
WebAssembly
🪽Lightweight
🪣Sand-boxed
🤹Load balancing?
🚦Routing?
github.com/taubyte/tau
Will provide:
🤹Load-balancing
🚦Routing
🎹Orchestration
It’ll also provide abstractions so what’s built locally will work
in production with no changes.
APP HTTP Inference Engine
HTTP
HTTP
WASM
MODULE
HOST CALL
HOST CALL
HOST CALL
HTTP
HTTP
Inference
Host
Module
The idea
1️⃣
2️⃣
Implementation of 1️⃣
tau implement a protocols called `gateway` that will
determine what host will be best suited to serve the request
based on:
WebAssembly caching and dependency modules
availability (including host modules)
Host Module Resource availability
Host Resource availability
Other constraints defined by developer like data gravity
Node Running
Gateway Proto
Gateway Protocol
Serving Node
(Substrate)
Serving Node
(Substrate)
Serving Node
(Substrate)
Satellite
(i.e. LLM Inference)
Satellite
Satellite
MUXED
TUNNEL
MUXED TUNNEL
M
UXED
TUN
N
EL
ORBIT PROTO
ORBIT PROTO
ORBIT PROTO
Host Module Resource
Availability
This is still to be implemented feature that will ask Host
Module to provide basic metrics for the gateway:
Caching score. Example: is the particular model loaded.
Resources Availability Score. Example: Is there enough
GPU mem to spin-up the model
Queue Score. Example: If Host Module uses queues, how
filled is the queue.
github.com/taubyte/vm-orbit
Extending WebAssembly Runtime in a secure way.
orbit
external
process
PROXY WASM CALL
PROXY ACCESS TO MODULE MEMORY
Example
github.com/ollama-cloud
trunings ollama into a satellite
Compile llama.cpp
Build plugin
Install dreamland
Start local Cloud
Attach plug-in
Login to Local Cloud
Create a project
Create a function
Call Generate
Stream Tokens
No SDK!
Check my previous project
github.com/samyfodil/taubyte-llama-satellite
Which actually has a nice SDK!
Trigger the function
- Copy ollama plugin to /tb/plugins
- Add it to config
In production
Your Application is Live!
taubyte/dllama
Ready for Cloud with better backends
Always local friendly!
Thanks!

More Related Content

Similar to WebAssembly is Key to Better LLM Performance

Node js meetup
Node js meetupNode js meetup
Node js meetup
Ansuman Roy
 
Open Source XMPP for Cloud Services
Open Source XMPP for Cloud ServicesOpen Source XMPP for Cloud Services
Open Source XMPP for Cloud Services
mattjive
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
Márton Kodok
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Timothy Spann
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
markgrover
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
Praveen Gollakota
 
Import golang; struct microservice - Codemotion Rome 2015
Import golang; struct microservice - Codemotion Rome 2015Import golang; struct microservice - Codemotion Rome 2015
Import golang; struct microservice - Codemotion Rome 2015
Giorgio Cefaro
 
WASM Beyond the Browser [2022_07_21 meetup]
WASM Beyond the Browser [2022_07_21 meetup]WASM Beyond the Browser [2022_07_21 meetup]
WASM Beyond the Browser [2022_07_21 meetup]
Salesforce
 
Nodejs
NodejsNodejs
Where should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and moreWhere should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and more
Bret McGowen - NYC Google Developer Advocate
 
Madrid meetup #7 deployment models
Madrid meetup #7   deployment modelsMadrid meetup #7   deployment models
Madrid meetup #7 deployment models
Mario Alberto Martinez Lopez
 
Monkey Server
Monkey ServerMonkey Server
Monkey Server
Eduardo Silva Pereira
 
Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015
Microsoft
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud Platform
Márton Kodok
 
Get the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewGet the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - Overview
ForgeRock
 
APIs at the Edge
APIs at the EdgeAPIs at the Edge
APIs at the Edge
Red Hat
 
AMF Flash and .NET
AMF Flash and .NETAMF Flash and .NET
AMF Flash and .NET
Yaniv Uriel
 
Lamp Zend Security
Lamp Zend SecurityLamp Zend Security
Lamp Zend Security
Ram Srivastava
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
DoiT International
 
Node js introduction
Node js introductionNode js introduction
Node js introduction
Joseph de Castelnau
 

Similar to WebAssembly is Key to Better LLM Performance (20)

Node js meetup
Node js meetupNode js meetup
Node js meetup
 
Open Source XMPP for Cloud Services
Open Source XMPP for Cloud ServicesOpen Source XMPP for Cloud Services
Open Source XMPP for Cloud Services
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
 
Import golang; struct microservice - Codemotion Rome 2015
Import golang; struct microservice - Codemotion Rome 2015Import golang; struct microservice - Codemotion Rome 2015
Import golang; struct microservice - Codemotion Rome 2015
 
WASM Beyond the Browser [2022_07_21 meetup]
WASM Beyond the Browser [2022_07_21 meetup]WASM Beyond the Browser [2022_07_21 meetup]
WASM Beyond the Browser [2022_07_21 meetup]
 
Nodejs
NodejsNodejs
Nodejs
 
Where should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and moreWhere should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and more
 
Madrid meetup #7 deployment models
Madrid meetup #7   deployment modelsMadrid meetup #7   deployment models
Madrid meetup #7 deployment models
 
Monkey Server
Monkey ServerMonkey Server
Monkey Server
 
Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud Platform
 
Get the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewGet the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - Overview
 
APIs at the Edge
APIs at the EdgeAPIs at the Edge
APIs at the Edge
 
AMF Flash and .NET
AMF Flash and .NETAMF Flash and .NET
AMF Flash and .NET
 
Lamp Zend Security
Lamp Zend SecurityLamp Zend Security
Lamp Zend Security
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
 
Node js introduction
Node js introductionNode js introduction
Node js introduction
 

Recently uploaded

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 

Recently uploaded (20)

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 

WebAssembly is Key to Better LLM Performance