WebAssembly is Key to Better LLM Performance

•

0 likes•64 views

Samy Fodil

How WebAssembly can be used to optimize and accelerate Large Language Models Inference in the Cloud.

Technology

Samy Fodil
Founder & CEO of Taubyte
WebAssembly
is Key to Better
LLM
Performance

Taubyte
An Open Source Cloud
Platform on Autopilot, where
coding in local environment
equals scaling to global
production. 🚀
Local Coding
Global Production

Proprietary
Open Source
✅Train
✅Host
Large Language Models

LLM Inference
📦Huge PyTorch Images
🔗Complex Dependencies
🔒Lock GPU ressources
🗿Primitive Orchestration
Problems

🪽Lightweight
🪣Sand-boxed
⚡Easy to orchestrate
LLM Inference
WASM to the rescue

🪽 Lightweight
⚡Fast to provision
🌱Cheap to cache
Closer to Data (Data Gravity) &
User (Edge Computing)

🪣Sand-boxed
🛡️Secure
🕹️Interfaces
Which means No, Restricted, Virtual or
Mocked: Networking, Filesystem and more

LLM Inference
You can use transformers
directly from python. This
will result in PyTorch and
other dependencies.
📦Huge PyTorch Images
🔗Complex Dependencies
🔒Lock GPU ressources

LLM Inference
You can for example use onnx, llama.cpp, candle (which
wrapps llama.cpp) or TensorRT-LLM and have a lower foot
print. But it comes with a few challenges:
🔒Lock GPU ressources
🧠Way harder to implement

Scaling LLM Inference (lvl1)
🎹Orchestration
🌱Caching

Inference Engines for LLM
🤹Load balancing?
🚦Routing?

With WASM
If we made LLM and AI, in general, available through a set of
common host calls we can combine benfits of
Inference Engines
🎹Orchestration
🌱Caching
WebAssembly
🪽Lightweight
🪣Sand-boxed

github.com/taubyte/tau
Will provide:
🤹Load-balancing
🚦Routing
🎹Orchestration
It’ll also provide abstractions so what’s built locally will work
in production with no changes.

APP HTTP Inference Engine
HTTP
HTTP
WASM
MODULE
HOST CALL
HOST CALL
HOST CALL
HTTP
HTTP
Inference
Host
Module
The idea
1️⃣
2️⃣

Implementation of 1️⃣
tau implement a protocols called `gateway` that will
determine what host will be best suited to serve the request
based on:
WebAssembly caching and dependency modules
availability (including host modules)
Host Module Resource availability
Host Resource availability
Other constraints defined by developer like data gravity

Node Running
Gateway Proto
Gateway Protocol
Serving Node
(Substrate)
Serving Node
(Substrate)
Serving Node
(Substrate)
Satellite
(i.e. LLM Inference)
Satellite
Satellite
MUXED
TUNNEL
MUXED TUNNEL
M
UXED
TUN
N
EL
ORBIT PROTO
ORBIT PROTO
ORBIT PROTO

Host Module Resource
Availability
This is still to be implemented feature that will ask Host
Module to provide basic metrics for the gateway:
Caching score. Example: is the particular model loaded.
Resources Availability Score. Example: Is there enough
GPU mem to spin-up the model
Queue Score. Example: If Host Module uses queues, how
filled is the queue.

github.com/taubyte/vm-orbit
Extending WebAssembly Runtime in a secure way.
orbit
external
process
PROXY WASM CALL
PROXY ACCESS TO MODULE MEMORY

github.com/ollama-cloud
trunings ollama into a satellite

Install dreamland
Start local Cloud
Attach plug-in

No SDK!
Check my previous project
github.com/samyfodil/taubyte-llama-satellite
Which actually has a nice SDK!

- Copy ollama plugin to /tb/plugins
- Add it to config
In production

taubyte/dllama
Ready for Cloud with better backends
Always local friendly!

In the realm of real-time applications, Large Language Models (LLMs) have long dominated language-centric tasks, while tools like OpenCV have excelled in the visual domain. However, the future (maybe) lies in the fusion of LLMs and deep learning, giving birth to the revolutionary concept of Large Action Models (LAMs). Imagine a world where AI not only comprehends language but mimics human actions on technology interfaces. For example, the Rabbit r1 device presented at CES 2024, driven by an AI operating system and LAM, brings this vision to life. It executes complex commands, leveraging GUIs with unprecedented ease. In this presentation, join me on a journey as a software engineer tinkering with WebRTC, Janus, and LLM/LAMs. Together, we’ll evaluate the current state of these AI technologies, unraveling the potential they hold for shaping the future of real-time applications.

Hands on with CoAP and Californium

Julien Vermillard

This document provides an overview of a hands-on workshop on the Constrained Application Protocol (CoAP). It outlines the agenda which includes introductions to CoAP, the Californium CoAP framework, and hands-on projects. Attendees will work through example CoAP client and server code using the Californium libraries and test their implementations. Advanced CoAP topics like security, proxies, and resource directories are also discussed.

Scaleable PHP Applications in Kubernetes

Robert Lemke

Kubernetes is also called the "distributed Linux of the cloud" – which implies that it provides fundamental infrastructure, which can solve a lot of challenges. Let’s see how PHP applications fit into this picture. In this presentation, we are going to explore when Kubernetes is a good fit for operating your PHP application and how it can be done in practice. We’ll look at the whole lifecycle: how to build your application, create or choose the right Docker images, deploy and scale, and how to deal with performance and monitoring. At the end you will have a good understanding about all the different stages and building blocks for running a PHP application with Kubernetes in production.

Open Cloud Computing Interface Presentation

Intel Corporation

The document discusses the Open Cloud Computing Interface (OCCI) which aims to establish standards for cloud computing interoperability and portability. OCCI focuses on defining open standards for cloud computing resources including infrastructure, platforms and applications. It uses existing web standards like HTTP, Atom and links to define resource representations, actions and relationships in a RESTful manner. The goal is to have a simple yet extensible standard that lowers barriers for innovation while protecting user freedoms.

Surat MuleSoft Meetup#2 - Anypoint Runtime Fabric

Jitendra Bafna

This document provides an overview and agenda for a virtual meetup on MuleSoft Runtime Fabric and Azure DevOps. It includes: - Details on the organizers and speakers for the event - An agenda covering What is Anypoint Runtime Fabric?, its architecture and components, a demonstration of manual installation, deployment on AWS and Terraform, and logging, monitoring, scaling and security. - Background on MuleSoft, including its history and products. - Descriptions of what Runtime Fabric is, its benefits over other deployment options like standalone servers, and how it provides isolation, scaling and automation of Mule applications. - A demonstration of the Runtime Fabric architecture and its components like controllers, workers and pods

Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...

GetInData

Did you like it? Check out our blog to stay up to date: https://getindata.com/blog The talk is focused on administration, development and monitoring platform with Apache Spark, Apache Flink and Kubeflow in which the monitoring stack is based on Prometheus stack. Author: Albert Lewandowski Linkedin: https://www.linkedin.com/in/albert-lewandowski/ ___ Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets. Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries. https://getindata.com

Import golang; struct microservice

Giulio De Donato

This document summarizes a talk about microservices architecture using Golang. It discusses some key advantages of Golang for building microservices like static compilation, concurrency support through goroutines, and built-in HTTP and JSON packages. It also covers Docker for containerization, and tools like Docker Machine, Swarm and Compose for orchestration. Prometheus is presented as an open-source monitoring solution for microservices running in Docker containers.

This document provides an overview of Node.js including its history, key features, and common questions. Node.js is a JavaScript runtime environment for building server-side and networking applications. It is based on Google's V8 JavaScript engine and uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, especially for real-time applications that require high throughput and scalability. The Node.js package ecosystem and large developer community help make it a full-stack JavaScript platform for building fast and scalable network applications.

Open Source XMPP for Cloud Services

mattjive

Matt Tucker discusses how XMPP (Jabber) can be used for cloud services and architectures. Some key benefits of XMPP over traditional web services include its support for real-time bidirectional communication, presence, and easier firewall traversal. Open source XMPP servers like Openfire and client libraries provide tools to build scalable cloud components and services. Examples like Twitter's use of XMPP for its firehose API demonstrate how XMPP can enable new types of cloud applications.

6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon

Márton Kodok

Az előadás témája hogyan építhető fel egy rugalmas, jól skálázható szolgáltatás a felhőszolgáltatók platformjain. Hogyan lehet megoldani, hogy a szolgáltatás, amelynek induláskor legfeljebb néhány tíz vagy száz felhasználót kell kiszolgálnia, akár több ezer vagy nagyságrendekkel több felhasználót is képes legyen kiszolgálni rugalmasan? Hátradőlni és csodálni az autoscaling funkciót a Black Friday napján. Beszélni fogunk virtualizációról, platformszintű virtualizációről, szuperkönnyű alkalmazáskonténerekről, a munkaterhek közel valósidejű “pakolgatásával”. Bemutatásra kerül a Google Cloud Platform számos komponense. Bankok, biztosítók, webshopok és így tovább mind a cloudban látják a kitörési pontot.

Pulsar summit asia 2021 apache pulsar with mqtt for edge computing

Timothy Spann

Near real-time anomaly detection at Lyft

markgrover

Tornado Web Server Internals

Praveen Gollakota

Import golang; struct microservice - Codemotion Rome 2015

Giorgio Cefaro

The document discusses microservices and the use of Golang for building microservices. Some key points: - Microservices are small, independent services that work together, organized around business capabilities. Golang is well-suited for building microservices due to its static compilation, concurrency features, and standard library support for web development. - The document discusses why the authors chose Golang for microservices, highlighting Golang's static compilation, lack of external dependencies, easy concurrency with goroutines, and standard library support for networking, JSON, protocols, and more. - It provides examples of building simple microservices in Golang, including a hello world service and a basic HTTP server, and discusses how middleware

WASM Beyond the Browser [2022_07_21 meetup]

Salesforce

Nodejs

Vinod Kumar Marupu

This document provides an overview of server-side JavaScript using Node.js in 3 sentences or less: Node.js allows for the development of server-side applications using JavaScript and non-blocking I/O. It introduces some theory around event loops and asynchronous programming in JavaScript. The document includes examples of building HTTP and TCP servers in Node.js and connecting to MongoDB, as well as when Node.js may and may not be suitable.

Where should I run my code? Serverless, Containers, Virtual Machines and more

Bret McGowen - NYC Google Developer Advocate

What do the terms serverless, containers, and virtual machines mean? Which should I use to build my app? The answer (as always) is "it depends." In this session learn the tradeoffs between these different approaches, whether you're building your app from scratch or want to move an existing web or mobile application to the cloud. We'll discuss open source tools such as Kubernetes, Istio, and Knative, and we'll discuss Google Cloud Platform tools like Compute Engine, Google Kubernetes Engine (GKE), App Engine, and Cloud Functions.

Madrid meetup #7 deployment models

Mario Alberto Martinez Lopez

This document outlines a presentation on deployment models given by José Cebrián, Marc Escalona, and Gonzalo Ozdy on July 14, 2021 for the Madrid MuleSoft Meetup #7. The presentation included an overview of deployment models, a comparative of CloudHub, on-premises, and Runtime Fabric options, use cases demonstrating the decision process, and the roadmap for Runtime Fabric. It was split into two sessions, the first on introduction to deployment models and the second a demo of Runtime Fabric installation.

Monkey Server

Eduardo Silva Pereira

Eduardo Silva is an open source engineer at Treasure Data working on projects like Fluentd and Fluent Bit. He created the Monkey HTTP server, which is optimized for embedded Linux and has a modular plugin architecture. He also created Duda I/O, a scalable web services stack built on top of Monkey using a friendly C API. Both projects aim to provide lightweight, high performance solutions for collecting and processing data from IoT and embedded devices.

Red Hat Forum Benelux 2015

Microsoft

This document discusses OpenShift v3 and how it can help organizations accelerate development at DevOps speed. It provides an overview of Kubernetes and OpenShift's technical architecture, how OpenShift enables continuous delivery and faster cycle times from idea to production. It also summarizes benefits for developers, integrations, administration capabilities, and the OpenShift product roadmap.

GDG DevFest Romania - Architecting for the Google Cloud Platform

Márton Kodok

Learn about FaaS, PaaS architectural patterns that make use of Cloud Functions, Pub/Sub, Dataflow, Kubernetes and platforms that hides the management of servers from the user and have changed how we develop and deploy future software. We discuss the difference between an event-driven approach - this means that you can trigger a function whenever something interesting happens within the cloud environment - and the simpler HTTP approach. Quota and pricing of per invocation, and the advantages and disadvantages of the serverless systems.

Get the Exact Identity Solution You Need - In the Cloud - Overview

ForgeRock

APIs at the Edge

Red Hat

AMF Flash and .NET

Yaniv Uriel

This document discusses various topics related to Action Message Format (AMF) including its history, benefits, and implementations across different programming languages and platforms. AMF allows serialization of ActionScript object graphs into a compact binary format for transmission between a Flash player and server. It offers benefits like fast serialization/deserialization and low bandwidth usage compared to alternatives like XML. The document provides information on AMF implementations for popular server-side languages and frameworks like PHP, Java, Python, Ruby and .NET.

Lamp Zend Security

Ram Srivastava

The document discusses the LAMP security stack and introduces the Zend Framework. It summarizes LAMP as an open source stack using Linux, Apache, MySQL, and PHP/Python/Perl. It then discusses the Zend Framework, which is a PHP framework that aims to simplify tasks and demonstrate best practices. The framework focuses on being modular, industry-leading, and easy to use while taking advantage of PHP5 features.

Kubernetes - State of the Union (Q1-2016)

DoiT International

Node js introduction

Joseph de Castelnau

Artificial Intelligence for XMLDevelopment

Octavian Nadolu

In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject. We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup. Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved. The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring. The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise. By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Neo4j

Similar to WebAssembly is Key to Better LLM Performance

Node js meetup

Ansuman Roy

Open Source XMPP for Cloud Services

mattjive

6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon

Márton Kodok

Pulsar summit asia 2021 apache pulsar with mqtt for edge computing

Timothy Spann

Near real-time anomaly detection at Lyft

markgrover

Tornado Web Server Internals

Praveen Gollakota

Import golang; struct microservice - Codemotion Rome 2015

Giorgio Cefaro

WASM Beyond the Browser [2022_07_21 meetup]

Salesforce

Nodejs

Vinod Kumar Marupu

Where should I run my code? Serverless, Containers, Virtual Machines and more

Bret McGowen - NYC Google Developer Advocate

Madrid meetup #7 deployment models

Mario Alberto Martinez Lopez

Monkey Server

Eduardo Silva Pereira

Red Hat Forum Benelux 2015

Microsoft

GDG DevFest Romania - Architecting for the Google Cloud Platform

Márton Kodok

Get the Exact Identity Solution You Need - In the Cloud - Overview

APIs at the Edge

AMF Flash and .NET

Lamp Zend Security

Kubernetes - State of the Union (Q1-2016)

DoiT International

Node js introduction

Joseph de Castelnau

Similar to WebAssembly is Key to Better LLM Performance (20)

Node js meetup

Open Source XMPP for Cloud Services

6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon

Pulsar summit asia 2021 apache pulsar with mqtt for edge computing

Near real-time anomaly detection at Lyft

Tornado Web Server Internals

Import golang; struct microservice - Codemotion Rome 2015

WASM Beyond the Browser [2022_07_21 meetup]

Nodejs

Where should I run my code? Serverless, Containers, Virtual Machines and more

Madrid meetup #7 deployment models

Monkey Server

Red Hat Forum Benelux 2015

GDG DevFest Romania - Architecting for the Google Cloud Platform

Get the Exact Identity Solution You Need - In the Cloud - Overview

APIs at the Edge

AMF Flash and .NET

Lamp Zend Security

Kubernetes - State of the Union (Q1-2016)

Node js introduction

Recently uploaded

Artificial Intelligence for XMLDevelopment

Octavian Nadolu

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Neo4j

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

Data structures and Algorithms in Python.pdf

TIPNGVN2

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

Vladimir Iglovikov, Ph.D.

Presented by Vladimir Iglovikov: - https://www.linkedin.com/in/iglovikov/ - https://x.com/viglovikov - https://www.instagram.com/ternaus/ This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation. Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners. This case study covers various aspects, including: People: The contributors and community that have supported Albumentations. Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions. Challenges: The hurdles in monetizing open-source projects and measuring user engagement. Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration. Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community. Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations. Mental Health: Maintaining balance and not feeling pressured by user demands. Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth. Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects. Explore more about Albumentations and join the community at: GitHub: https://github.com/albumentations-team/albumentations Website: https://albumentations.ai/ LinkedIn: https://www.linkedin.com/company/100504475 Twitter: https://x.com/albumentations

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...

Zilliz

Communications Mining Series - Zero to Hero - Session 1

DianaGray10

This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered: • Communication Mining Overview • Why is it important? • How can it help today’s business and the benefits • Phases in Communication Mining • Demo on Platform overview • Q/A

Full-RAG: A modern architecture for hyper-personalization

Zilliz

Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.

How to use Firebase Data Connect For Flutter

Daiki Mogmet Ito

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

SOFTTECHHUB

As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

How to Get CNIC Information System with Paksim Ga.pptx

danishmna97

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Malak Abu Hammad

Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers: * What is Vector Search? * Importance and benefits of vector search * Practical use cases across various industries * Step-by-step implementation guide * Live demos with code snippets * Enhancing LLM capabilities with vector search * Best practices and optimization strategies Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications. #MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology

“I’m still / I’m still / Chaining from the Block”

Claudio Di Ciccio

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Generative AI Deep Dive: Advancing from Proof of Concept to Production

20240605 QFM017 Machine Intelligence Reading List May 2024

Data structures and Algorithms in Python.pdf

Securing your Kubernetes cluster_ a step-by-step guide to success !

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Introduction to CHERI technology - Cybersecurity

UiPath Test Automation using UiPath Test Suite series, part 5

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...

Communications Mining Series - Zero to Hero - Session 1

Full-RAG: A modern architecture for hyper-personalization

How to use Firebase Data Connect For Flutter

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

Removing Uninteresting Bytes in Software Fuzzing

How to Get CNIC Information System with Paksim Ga.pptx

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

“I’m still / I’m still / Chaining from the Block”

WebAssembly is Key to Better LLM Performance

1. Samy Fodil Founder & CEO of Taubyte WebAssembly is Key to Better LLM Performance

2. Samy Fodil Founder & CEO of Taubyte

3. Taubyte An Open Source Cloud Platform on Autopilot, where coding in local environment equals scaling to global production. 🚀 Local Coding Global Production

4. Taubyte Locally

5. Taubyte In Prod

6. https://tau.how

7. LLMs Generative AI

8. INPUT Large Language Models OUTPUT

9. Proprietary Open Source ✅Train ✅Host Large Language Models

10. LLM Training

11. LLM Inference 📦Huge PyTorch Images 🔗Complex Dependencies 🔒Lock GPU ressources 🗿Primitive Orchestration Problems

12. 🪽Lightweight 🪣Sand-boxed ⚡Easy to orchestrate LLM Inference WASM to the rescue

13. 🪽 Lightweight <2MB ~4GB

14. 🪽 Lightweight ⚡Fast to provision 🌱Cheap to cache Closer to Data (Data Gravity) & User (Edge Computing)

15. 🪣Sand-boxed 🛡️Secure 🕹️Interfaces Which means No, Restricted, Virtual or Mocked: Networking, Filesystem and more

16. LLM Inference You can use transformers directly from python. This will result in PyTorch and other dependencies. 📦Huge PyTorch Images 🔗Complex Dependencies 🔒Lock GPU ressources

17. LLM Inference You can for example use onnx, llama.cpp, candle (which wrapps llama.cpp) or TensorRT-LLM and have a lower foot print. But it comes with a few challenges: 🔒Lock GPU ressources 🧠Way harder to implement

18. Scaling LLM Inference (lvl1) 🎹Orchestration 🌱Caching

19. Inference Engines for LLM 🤹Load balancing? 🚦Routing?

20. With WASM If we made LLM and AI, in general, available through a set of common host calls we can combine benfits of Inference Engines 🎹Orchestration 🌱Caching WebAssembly 🪽Lightweight 🪣Sand-boxed

21. 🤹Load balancing? 🚦Routing?

22. github.com/taubyte/tau Will provide: 🤹Load-balancing 🚦Routing 🎹Orchestration It’ll also provide abstractions so what’s built locally will work in production with no changes.

23. APP HTTP Inference Engine HTTP HTTP WASM MODULE HOST CALL HOST CALL HOST CALL HTTP HTTP Inference Host Module The idea 1️⃣ 2️⃣

24. Implementation of 1️⃣ tau implement a protocols called `gateway` that will determine what host will be best suited to serve the request based on: WebAssembly caching and dependency modules availability (including host modules) Host Module Resource availability Host Resource availability Other constraints defined by developer like data gravity

25. Node Running Gateway Proto Gateway Protocol Serving Node (Substrate) Serving Node (Substrate) Serving Node (Substrate) Satellite (i.e. LLM Inference) Satellite Satellite MUXED TUNNEL MUXED TUNNEL M UXED TUN N EL ORBIT PROTO ORBIT PROTO ORBIT PROTO

26. Host Module Resource Availability This is still to be implemented feature that will ask Host Module to provide basic metrics for the gateway: Caching score. Example: is the particular model loaded. Resources Availability Score. Example: Is there enough GPU mem to spin-up the model Queue Score. Example: If Host Module uses queues, how filled is the queue.

27. github.com/taubyte/vm-orbit Extending WebAssembly Runtime in a secure way. orbit external process PROXY WASM CALL PROXY ACCESS TO MODULE MEMORY

28. Example

29. github.com/ollama-cloud trunings ollama into a satellite

30. Compile llama.cpp Build plugin

31. Install dreamland Start local Cloud Attach plug-in

32. Login to Local Cloud

33. Create a project

34. Create a function

35. Call Generate Stream Tokens

36. No SDK! Check my previous project github.com/samyfodil/taubyte-llama-satellite Which actually has a nice SDK!

37. Trigger the function

38. - Copy ollama plugin to /tb/plugins - Add it to config In production