Tom Mason (Stability AI) - Computing Large Foundational Models Unlisted

Hyper realistic future landscape horizon
Computing large
Foundational models

PROPRIETARY & CONFIDENTIAL 2
Generative AI’s impact
and capabilities are just
getting started

PROPRIETARY & CONFIDENTIAL
Internet 1997 - 2021 AI 2021 - 2030 (estimate)
Line drawing of the Guggenheim
PROPRIETARY
&
CONFIDENTIAL
3
Cathie Wood: AI market
could be $87 Trillion
$90T
Enterprise value ($T)
AI
applications
Foundation AI model-
as-a-service APIs
AI hardware
AI software

AI age
Computer age
4
Internet age

The cost & time to produce any
content is rapidly approaching
zero
New classes of AI models are increasingly cheaper and better across
all modalities
Classic western movie
30.0s | 2¢
To generate a high quality
image with text
0.5s | 0.2¢
8.0s | 2¢
To generate 750 words
of human level text
Price and times quoted are industry benchmarks and not meant to be specific to Stability
5
6.0s | 0.2¢
To generate 750 words
of human level text
To generate a high quality
image with text
2022 2023

Creative agencies face a multi-
billion dollar opportunity to
embrace AI or get left behind
6
Create more surreal experiences
Generative AI enables them to…
Serve more clients
Expand margins
AI supercharges creativity and output for every member
of the team so they can do more of their best work
Rapidly prototype and blend new creative ideas
across all content formats
By being able to…
Generate 100s of variants
in seconds and
maximize inspiration
Write proposals 3x faster
Enter novel creative realms
Create campaigns
personalized to each
client customer
Futuristic city with towering skyscrapers

Companies across the media landscape are
already racing to be the first to adopt
7
Next-generation AI
gains steam as Jasper
gets $1.5B valuation
PITCHBOOK | AI
How BBDO is Supercharging
the Creative Process With
Generative AI
ADDWEEK | ARTIFICIAL
INTELLIGENCE
Reinventing search with a new AI-
powered Microsoft Bing and
Edge, your copilot for the web
OFFICIAL MICROSOFT BLOG
It feels natural for us to dive head first into the potential
of AI … we are building on their legacy by exploring
how humans and machines work in harmony.
DDB EXECUTIVE GLEN LOMAS
Generative AI is an evolution of people working with
machines to create content. For us, the magic occurs
when you combine human insight – and cultural insight
– with this ability to generate content with machines …
This is WPP’s role. We apply these technologies,
combine them with insight, and help our clients grow.
STEPHAN PRETORIUS

Stability AI is the leader
in open-source,
enterprise, generative
AI for media companies

Built and supported by some
of the best names in AI
David Ha
HEAD OF STRATEGY
& RESEARCH
Robin Rombach
CO-INVENTOR
OF STABLE DIFFUSION
Patrick Hebron
VP OF PRODUCT R&D
Stanislav Fort
SENIOR RESEARCH SCIENTIST
Emad Mostaque
CEO
Tom Mason
CTO
Former CTO at
Chorus Intelligence
Ren Ito
COO
Former CEO of Mercari Europe,
Japan’s 1st Unicorn $7B IPO
Peter O’Donoghue
CFO
Former UK Head of Technology
and Audit Partner at Deloitte
Our investors:
9

10
Infrastructure:
Ezra-1
Unlike most startups, we have a critical
strategic asset. One of the fastest
supercomputers globally is the Ezra-1
UltraCluster at a steep discount to
market value.
The combined UltraCluster has 48,000
Cores, 576 Tb RAM, 4 Pb NVMe SSD,
4,000 A100s. We have 10 Pb of high
speed FSx Lustre SSD storage that we
can scale well above 100 Pb.

Our 3 technology pillars
11
Platform API
Engineering
HPC
Data
Serving an optimised infrastructure and application
stack for our customers to use our models.
Technology pillars in a grassy field
RLHF
Applied ML
Fine-tuned Models
Custom Pipelines
Modality Teams
Research
Labelling
Foundation Models

Stable Diffusion is
redefining creativity
The silly monsters parade, 8k, hyper
details, rich colors, photograph
140K+
Stable Diffusion
community
Stable Foundation Discord
members include Artists, Beta
Testers & Developers
3B+
Images generated
since launch
125K
Reddit
subscribers
52K
Stable Diffusion
Github stars
341
Hugging Face Stable
Diffusion models
Japanese spaceship in the style
of a woodblock print
12

13
Stable Diffusion got over 50,000
GitHub stars in 150 days
360 days 1,080 2,160
# of stars on GitHub since repository was started
3,240 days
2,880
720 1,440 1,800
Transformers
Stable Diffusion
Cockroach
Ethereum
Bitcoin
Vercel
2,520
50K

PROPRIETARY
&
CONFIDENTIAL
Open source =
platform value
Stable Diffusion is the
foundation layer
40M users
2M downloads
5M monthly
traffic
2M users/
month
15M images
generated
4M users/
month
41M downloads
Brutalist architecture on top of K2
PROPRIETARY
&
CONFIDENTIAL
14

Our new content model,
SDXL, was made for
professional media use
● More expressive: 2.4B parameters (3x more
than before)
● Easier to use: less complex prompting to get
beautiful outputs
● Enhanced image composition: greater
capability to produce and position legible text
● Wider breadth & depth of available styles:
better incorporation of photorealistic and other
applicable styles
15

Stability offers the full-suite of AI
models tailored to enterprises
16
State-of-the-art models across
all media modalities
Stable Diffusion
DeepFloyd “IF” (Q1)
StableChat (Q2)
StableMusic (Q2)
Text-to-3D (Q4)
Text-to-Video (Q3)
Tailored to enterprise who care
about IP security & compliance
No saving or usage of
proprietary IP in training
Fully auditable model architecture
and dataset construction
Enterprise hosted SLAs &
support or on-premise open
source support
Interior of a spaceship
Timelines are estimates and subject to change

We’re building the default
models for every domain
Stable 3D
COMING Q4
Stable Music
COMING Q2
Stable Video
COMING Q3
PROPRIETARY
&
CONFIDENTIAL
1
7

Stability Animation Alpha-testers are already
demonstrating the model’s capabilities
18

Our newest models have
state-of-the-art
performance
DeepFloyd “IF”
COMING Q2
FID Scores
Stable Diffusion = 12.6
DALLE = 10.3
Google Imagen = 7.2
DeepFloyd IF = 6.6
(Lower is better)
19

Realistic singing or spoken
voice conversion
20
Convert between
spoken voices…
The Interior of a spaceship
Original Converted
… and singing voices.

We have quickly become the standard across the
media ecosystem
Photoshop
plugin
Semantic
mixing
Seamless
3D textures
All integrated with
Stable Diffusion
Enterprises can leverage Stability’s
models via native integrations with the
largest cloud providers
Short-form creative
content
https://aws.amazon.com/blogs/machine-
learning/stability-ai-builds-foundation-
models-on-amazon-sagemaker/
https://stability.ai/blog/stability-ai-makes-
its-stable-diffusion-models-available-on-
amazons-new-bedrock-service

We have quickly become the standard across the
media ecosystem
Photoshop
plugin
Semantic
mixing
Seamless
3D textures
All integrated with
Stable Diffusion
Stability to be the default AI on every chip
Short-form creative
content

Deployment: Easily fine-tune Stable Diffusion on your data
using AWS Bedrock
23
Data on the
Virtual Private
Cloud (VPC)
Bedrock
Fine-tuned
Model
Encrypted data that
does not leave the
VPC and which will
not be used to train
the original base
model
Fine-tuning our
models for the desired
task without having to
annotate large
volumes of data
Soon, AWS Bedrock will allow for an easy fine-tuning process for various use cases without worrying about data’s privacy.
Our full suite of models will also be available for training on this API in the future.

Customers get the benefit of AWS Sagemaker &
Stability’s models tightly integrated
Easily deploy, manage and fine-tune
Stability models at scale with optimized
infrastructure
Enterprise level SLAs with 99.95%
uptime and downtime redundancy
Dedicated expertise and proof of
concept support from Stability trained
Sagemaker specialists

Platform API Docs Site
25
platform.stability.ai
Stability SDK
● Packaged / PyPI
● T2I, I2I, Inpainting
● Variants (models / upscalers)
Typescript Client
● Helper functions
● Node.js support
Interfaces
● gRPC
● REST

Platform Interfaces / SDK
26
REST API
● Generations
● Upscaler
Python / Typescript SDK
● Examples
Discord Bots
● Demo new models (XL launched in Discord)
● Gather human feedback
Notebooks
● Gradio / Colab
● Support developers
History / Asset Service
● Asset storage S3/R2
● Persisted history
● User assets (e.g. fine-tuning)

Get better ideas faster with the
Stability Platform API
New concepts can be created by simply sketching a design and
pairing it with a text prompt using Stable Diffusion controllable networks
Input image
Output images

PROPRIETARY
&
CONFIDENTIAL
28
Stable
Diffusion
2.0
Upscaler
4x

PROPRIETARY
&
CONFIDENTIAL
Presets
29
● New style presets
● Available via API using preset tag
● Use a combination of additional
positive prompts and negative
prompts
● Available in DreamStudio

Fine-Tuning API
30
- Evaluation > Dreambooth LoRA
- 130 seconds pre-proc + training time!
- Support for objects (e.g animals) + styles
- Ingestion Pipeline (CLIPSeg)
- Deployment in SageMaker (training)
- Integration of API middleware
- Jobs requested through gRPC/REST
- Routed via queue
- Using SM training routines
- CloudWatch dashboard

Explore personalisation by leveraging your
universe of past photos to inspire the future
Collect a set of input images and use our fine-tuning API to
learn a “style” to then output similar creative concepts
Input style
Output images
32

Animation API
33
● Static image animation, video output
● Methods including 2D, 3D, 3D Warp, Video Init
● Frame interpolation
● Project storage in asset service
● Gradio Notebook
Prompt:
“A cyberpunk futuristic colourful crowded luxurious
pedestrian street avenue hi-tech at morning time,
blue neon lights, ray tracing, hdr, realistic shaded,
extremely detailed, sharp focus, soft lighting, sunny"

Soon, fine-tuning our animation models on specific characters and scenes will allow for
a quick animation production process, in addition to other visual effect capabilities
Our text-to-video and animation
technologies are rapidly evolving

Animation /
35
emoji 1.0 photo stickers personalized
AniMoji

Our recent competition with Peter Gabriel
#diffusetogether
36

Optimisation
37
OneFlow (Static Compilation)
● It is reasonably fast:
○ 56.98 iterations per second on A100 GPUs, over 2x faster than
Transformers.
○ Not the fastest compared to TensorRT (62.2it/s) and Paddle
(68.2it/s), but:
● Deployment-friendly, enough to justify the overhead:
○ Nice multi-resolution support through multiple graphs (see
https://github.com/Oneflow-Inc/diffusers/wiki/Optimization-for-
Multi-Resolution-Picture for details). Fewer warm-up instances
needed for multi-resolution and less middleware complexity
○ Fast compilation: a couple of seconds vs. 10+ mins.
○ Can dynamically load weights, nice for custom models.
○ Easy to modify network; sth like ControlNet or loading Lora
weights are not huge hassles (unlike AIT and TensorRT,
basically impossible without re-compilation).
○ Mocking torch environment, low maintenance once deployed.
○ Rapid ongoing development.
Upcoming
Frameworks
Multiple targets / control planes

Integrations
38

DreamStudio
39
● Framework
○ Migration to React
○ New UI/UX
● Generation interface
● Editor
● Presets
● Back-End
○ History
○ Asset Service

Multiple Generations Styles
40

Variations
Infinite History
41

Stable Diffusion 2.0 Depth to Image
TRANSFORM IMAGES DYNAMICALLY

Depth to Image
TRANSFORM IMAGES DYNAMICALLY

Stable Diffusion Inpainting
CHANGE IMAGE DETAILS

Stable Diffusion v2.1: “Professional photograph of Game of Thrones as a Japanese drama…”

Stability’s content models & surrounding infrastructure will
soon enable full control of creative outputs
47
Content models with enough flexibility to get exact outputs while still
providing creative exploration
Create custom styles &
personalized content models
based on private assets with
Stability’s fine-tuning
infrastructure
Leverage SDXL variants
optimized for specific use-cases
& stylistic outputs e.g. cinematic,
Decide which parts of images to
keep constant and which to
change along a variety of
dimensions (depth, pose,
boundary, etc.) through custom
SDXL adapters
Allow users to modify images
using natural language with
Stability Instruct models

Tap into global markets automatically with
soon to be released fully controllable models
Output images
International
cultural themes
Today
Use a base design and
pass into Stable Diffusion
img2img for quick
concepting and then
finalize in software of
choice
Q2
Use custom T2I adapters
with our latest model
(Stable Diffusion XL) to
make fine-grained
adjustments with multiple
control types
Input image

Sample POCs that can be executed in 2023
Film & TV Creative
Concepting
Get to better ideas faster by concepting thousands of creative ideas
in minutes and leveraging your past universe of creative work
H1 2023
Advertising Collateral
& Thumbnails
Save time & money on marketing by leveraging production
content to automatically generate first-pass marketing
collateral & thumbnails
H1 2023
Script & Storyline
Assistance
Get feedback and ideas on scripts with an evision
specific large language model
H2 2023
Post Video Production
Augmentation
Remove the need for extra shoots by automating post-
production touch-ups
H2 2023
Instant International
Dubbing
Reach new audience segments by allowing any content to be
instantly dubbed in another language
H2 2023
Timelines are estimates and subject to change

Minimize re-screens with post-production scene
augmentation
Empower editors to experiment across all
content dimensions
Scenes & Props (SDXL & Controllable Networks)
Dialogue, Sounds & Interactions (StableVoice & StableMusic)
1. Fine-tune a custom Stable Diffusion model on a chosen
repository of production assets
2. Enable members of the post production team to issue
natural language commands on a subset of frames to
test changes such as scene and prop alterations
3. Utilize these changes directly in production or to more
efficiently target re-screens
Streamline post production edits with natural-
language based editing
Lush jungle background
Urban town scene

The next generation of film, tv, animation & music,
will be redefined by Generative AI
Dynamic & Interactive Content
Personalized to Consumers
● Real-time adaptation of movies & shows with characters, scenes
and whole storylines generated on-the-fly.
● Seamless dubbing and content translation to facilitate global
accessibility and engagement.
● Hyper-personalized music & voice mixing to create the perfect
composition.
Massive Leverage to Creators who have
the Best Visions
● Large-scale ideation enabled across all building blocks of movie
/ show development.
● Democratized access to easy-to-use tooling to go from concept
to high-quality content.
● Novel music and voices generated via a combination of text and
existing tooling.
51
“Create a Jumanji themed world for my 5 years old son”

52
The content production & development pipeline in the
Generative AI Era will be rapid and efficient
Content production Character development Scene creation Audio effects and dubbing
Fine-tune StableLM on relevant
stories and characters for daily
content production
Fine-tune Stable Diffusion on your
characters for further
development
Create stunning videos and
animations centered on chosen
characters and scenes
Fast and low-cost dubbing using
our audio models

53
Create the perfect character using multiple variations
A Princess
An Arabian Princess
Add a desert
background
The chosen character

Direct the model to elaborate and develop the
content
Deploy the customized model to help create
characters and plots based on given themes
and genres
StableLM will be used for script writing, content creation,
and character development
54
As a child, Princess Farah’s father, High-King of the Rub’ Al
Khali’s seven Emirates, died under mysterious circumstances.
The king had previously banned his brother Oday from ever being
a rightful heir due to the latter’s crimes and corruption in the
kingdom. However, Oday took the chance and declared himself
king, banishing Princess Farah to the deep valleys of a distant
land.
Help me create a story of a female Arabian
princess. Main themes should be family and
unity
Give me a general outline of how the plot will
look like
An old sage learns of Farah’s identity from the King’s Mark on her
right palm. He then tells her the prophecy of a princess who
retrieves the lost Staff of Unity to reunite a kingdom in times of
deep turmoil. However, in the process, the princess sacrifices the
most precious gift of love to save the kingdom. Farah then makes
it her life’s goal to fulfill her destiny.
Fine-tune StableLM on various characters
and plots

55
Instant access to interactive, human-like
characters is becoming widely accessible
Large language models provide on-demand
conversation & action simulation
● Accessible: Easily interactive as if talking to
another person
● Personalized: Provide context-dependant
answers & reasoning
● Knowledgeable: Understand storylines, world-
building & fundamental subjects
● Customizable: Able to leverage external tools
and knowledge bases to take on new functions
& personalities
55
“Come on, John. Let’s get out of here”
Follow her Go back

Supercharge creative abilities and keep
IP safe with multiple ways to partner
Be among the first to know and
integrate new models
• Early access to new foundation
models and model updates.
• Dedicated 1x1 “Ask me
anything” each month.
• Shared Stability Slack channel
for asynchronous assistance.
Access enterprise-grade, hosted
APIs of Stability AI’s models.
• Model and fine-tuning APIs.
• Enterprise support & SLAs
via Amazon Sagemaker.
• Usage-based, tiered
pricing.
• No research usage of
proprietary data.
Leverage Stability’s AI engineers
to build custom models based on
Meitu’s unique assets
• Meitu specific styles and
models created based on past
assets. Both design and
language.
• Pipelines setup to enable
continual updates and
modifications.
• Hosted on prem or via
Sagemaker.
Stability Hosted
API
Preferred Access
Program
Custom
Models

Rocket blasting into space
The future of creativity
is already here

Integrate generative AI correctly, efficiently & safely with
Stability’s three-pronged approach
58
Create the appropriate POC creation &
evaluation pipeline for integration
• Dataset aggregation and labeling
pipeline is created for any custom
model work needed. All models hosted
on-prem or via AWS Sagemaker.
• POC sandbox for evaluation is setup
for frequent Stability & evision review
sessions on POC progress and needed
adjustments.
Integrate POC throughout relevant orgs
in a phased approach while restarting the
process for the next POC
• evision and Stability work together to
conduct a secure, phased roll-out with
the appropriate quality assurance &
reporting procedures.
• Scoping starts on the next set of
workflows & experiences to tackle.
• Frequent cadence of research and
engineering previews to keep you up to
date on what’s coming.
POC Creation & Evaluation Integration & Deployment
Find the right use-case & implementation
strategy for today while planning for
tomorrow
Scoping
• Audit evision’s internal processes and
internal data to determine feasible &
quick high ROI tasks to triage.
• POCs mutually agreed upon based on
a combination of ROI, speed of testing,
ease of integration / deployment,
dataset availability and technological
maturity.
1 2 3

Tom Mason (Stability AI) - Computing Large Foundational Models Unlisted

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tom Mason (Stability AI) - Computing Large Foundational Models Unlisted

Similar to Tom Mason (Stability AI) - Computing Large Foundational Models Unlisted (20)

More from Techsylvania

More from Techsylvania (20)

Recently uploaded

Recently uploaded (20)

Tom Mason (Stability AI) - Computing Large Foundational Models Unlisted