When I was a kid, I wanted to build a holodeck—the immersive 3D simulation system from Star Trek… so I started making games.
This is a vision of how close we are to a holodeck:
Generative AI
Compositional frameworks
Computational scaling
1. “Computer, make
me {anything}”
The direct-from-imagination era
When I was a kid, I wanted to build
a holodeck—the immersive 3D
simulation system from Star Trek…
so I started making games.
This is a vision of how close we are
to a holodeck:
Generative AI
Compositional frameworks
Computational scaling
12 charts showing the trends
20 products & papers surveyed
Jon Radoff
Metavert
2. If you wanted to build
a holodeck…
• You’d need a way to generate and compose ideas:
“Computer, make me a fantasy world with elves
and dragons… except made of Legos.”
• You’d need a way to visualize the experience.
Physics and realistic light simulation (ray tracing)
• You’d need a way to have a persistent world with
data, continuity, rules, systems.
(Yes, you’d also need a way to deliver the experience of the world
convincingly. in Star Trek the simulation had physical force-feedback. I’ll touch
on that at the end, because some of what is under development is even more
compelling than that.)
3. “Imagination is more
important than
knowledge. For
knowledge is limited,
whereas imagination
embraces the entire
world, stimulating
progress, giving birth
to evolution.”
– Albert Einstein, 1929 interview
with Sylvester Viereck: 1929
But sharing your imagination can be hard. Still, we
know from the evidence that people yearn to to do
so…
4. ChatGPT is really a
Virtual World Engine
ChatGPT is fun. It is playful. It is a creative palette.
Within its pairing of a large language model interwoven with an RLHF
(reinforcement learning with human feedback) system, springs forth the
dreams of virtual worlds.
Above from: “Creating a Text Adventure Game with ChatGPT” bit.ly/AdventureGPT
5. Lensa: accessing basic human desires
Lensa grew to tens of
millions of revenue in only a
few weeks. It lets you
imagine different versions of
yourself and share it with
your friends, gratifying our
egos and our creativity.
Its enabling technology,
Stable Diffusion, is
disruptive because it
dramatically reduces the
cost of generating artwork,
enabling new use cases like
Lensa.
6. *Overly simplified,
glossing over iterative
nature of game
development which forms
return-loops to revisit and
evolve along the way; or
the meta-aspects to
support scale such as QA,
producers, etc.; and some
content types (like all the
audio).
Environments Lighting, Baking
Compose into
World
Rig, Animate
Paint Textures,
UV Unwrap
3D Modeling
Idea / Sketch Concept Art
Narrative
Stories, Dialog,
Characters, Setting
Ideation, Vision
Game Frontend
Game Backend
Code
Build &
Deployment
Data, Content,
System Design
Designs,
Prototypes
Optimize
Geometry
Live Ops
Games
Experiences
Simulations
“Metaverse”
Why Building Worlds is (currently) Hard*
7. Compositional Frameworks
ChatGPT “works” as a product, not because it “gives the right answer” all the time – but
because the experience is both magical, as well as playful and co-creative.
It let’s you play with your own imagination.
But making it easy to share and build bigger worlds we can explore together requires a
compositional framework. These are the technologies that allow you to go from
imagination-to-screen.
Let’s look at what exists so far…
8. Minecraft
Sandbox for creative expression
Minecraft is not only a creative
tapestry for individuals – it is a
space of shared imagination where
people compose vast worlds.
Screens here are taken from Divine
Journey 2, a colossal modpack
composed of many other mods and
deployed on servers for players to
experience together:
Divine Journey 2
Modpack
Servers
Mod 1 Mod 2 Mod N
…
9. Roblox
Walled garden for virtual worlds
Many of the most popular
experiences of Roblox are not
“games” in the traditional sense.
Many would not have gotten
greenlit in the mainstream game
publishing business – but in a
shared space of creativity, new
types of virtual worlds flourish.
10. Dungeons
& Dragons
One of the original
compositional frameworks—
indeed, the earliest
metaverse—was D&D. It has
rules and structure for
shared imagination and
persistent, virtual worlds
(campaigns).
Online platforms facilitate
live interaction for D&D,
while Generative AI is
unlocking on-demand
visualization for D&D.
Images above and right generated in Midjourney.
11. 3D Engines
A decade ago, if you wanted to build
an immersive world in 3D, you’d
need to know a lot about graphics
APIs and matrix math.
For people who couldn’t realize their
creativity in a sandbox or walled-
garden—platforms like Unreal and
Unity enable the creation of real-
time, immersive worlds that
simulate reality.
Image from Unreal Engine 5.1
12. Persistent Worlds
3D engines provide a window into a
world. But the memory of what
happens in a world—the history,
economy, social structure—as well as
the rules that undergird a world,
require a means of achieving
consensus between all participants.
Walled-gardens like Roblox do this for
you: but large-scale worlds have
required the work of large
engineering teams who build from
scratch.
13. You will speak worlds
into existence:
Compositional Frameworks will use generative
AI to accelerate the worldbuilding process;
begin with words, refine with words.
Physics-based methods such as ray tracing will
simplify the creative process while delivering
amazing experiences.
Generative AI will become part of the loop of
games and online experiences, creating
undreamt-of interactive forms.
Compute-on-demand will enable scalable,
persistent worlds with whatever structure the
creator imagines.
“In the beginning there was the Word…”
14. Parallel
Computation
Computers can dream of
worlds—and we can see into
them—due to advances in
parallel computing.
The next few slides will explain
the exponential rise in
computation—in your devices
and in the cloud—driving the
direct-from-imagination
revolution, and then return to
what the near-term future has in
store.
15. • Most 2022 phones had 2+ TFLOPs* of compute
(2x10^12) which is 100,000,000 faster than the
computer that sent Apollo to the moon
• Frontier supercomputer passed 1.0 exaflops (10^18)
• 1.5 exaflops on the “virtual supercomputer” that
combined for the Folding@Home Covid 19 simulation.
• Top500 Supercomputing clusters add up to ~10 exaflops
• NVIDIA RTX-4090’s shipped at least 13 exaflops
• Playstation 5’s combined surpasses 250 exaflops
• Apple ships over 1 zettaflop (10^21) of compute in 2022
• Intel is working toward a zettaflop supercomputer
By 2027, hundreds of zettaflops seems plausible. By then,
compute at the start of 2023 compute will seem like a
rounding error again.
Compute before
2020 is a rounding
error vs. today
Sources: “What’s the total global compute capacity?”
https://twitter.com/jradoff/status/1611534395780861953
* In all these comps I blur single vs. double precision & matrix vs. vector ops, so
it isn’t apples-to-apple. This will be a topic for a future post on global compute;
meanwhile, this still ought to provide a rough order-of-magnitude.
Top500 supercomputing clusters, showing the total, largest and the smallest.
16. GPU with 100 cores**
CUDA*
Graphics or
AI workload
Chops workload
into 100X tasks
“Ordinary”
Programs
(Serial
Execution)
OS
CPU with 8 cores
Separates
into threads
8 programs run
simultaneously (programs
that have multiple
concurrent threads can
also run faster)
Graphics or AI execution happens
100X faster. Good for local
graphics generation, ray-tracing (if
you have lots of cores) and many
AI inference jobs (the part where
you evaluate the result of a
previously-trained AI model)
How Parallel Work Happens on a Computer
*Or one of the several CUDA competitors
**Just for the visual example. In 2023, an NVIDIA RTX 4090 has a lot more. 16,384 cores!
17. The dramatic expansion of global compute
is being driven by GPUs.
GPUs are continuing to deliver performance
while CPUs have dropped off—which is
powering advances in graphics and AI. GPUs
are good at parallel compute (and can
access memory more efficiently) which is
important for both of these domains.
Performance is critical for making advanced
parallel computing available affordably for
more use cases.
Chart Source: NVIDIA
GPU Computing is Extending Moore’s Law
18. But AI models
grow faster than
GPU performance
Source: “On the Opportunities and Risks of Foundation Models,”
Narayanan et al
19. Fortunately, cost
per FLOP is
decreasing
Source: https://epochai.org/blog/trends-in-gpu-price-performance
20. Cost/FLOP + Better Algorithms = Lower Costs
ImageNet training costs decreased >95% in 4 years
DeepMind: using AI to improve the matrix multiplication algorithm
(used by AI -- and a lot of graphics transformations)
21. When computation needs exceed the power
of one computer you have a few options:
Build an actual supercomputer (CPUs/CPUs all in one
location, needs high speed interconnects). Currently
needed for workloads like simulations or training large
models.
Build a virtual supercomputer. Examples:
• Folding@Home, good for huge workloads when
latency and shared memory don’t matter much
• Ethereum network – good for cryptographic and
smart contract workloads
• Put code into containers and orchestrate them over
large CPU capacity using Kubernetes, Docker
Swarm, Amazon ECS/EKS, etc.
22. How work is parallelized across many computers
Network of 16 GPUs/CPUs
Orchestration*
GPU-
Accelerated
Code (AI
Training, etc.) Distributes & schedules
containerized code out
to the network
Server Code,
Serial
Programs
Workloads happen 16X faster than as if
on a single device. Good very large jobs
such as training a big AI model, protein
folding, simulations**, cinematic
raytracing. In aggregate, networks form
into high-performance supercomputers.
Network of Servers
Workloads can support 16X the number of
users. This is more about support high scale
as opposed to speed alone. Workloads can
be centralized, distributed across multiple
datacenters, or deployed at edge networks
(or even delivered to individual developer
workstations)
Orchestration*
Package code into
microservice
containers
Distribute &
schedule containers
across the network
*Kubernetes, Amazon ECS/EKS, Docker Swarm, etc.
** Some sims like Folding@Home started with CPU and added GPU optimizations later.
23. Scaling GPU Parallelism
(GPU Power) X (Networked Capacity)
= AI Supercomputer.
Source: https://www.gwern.net/Scaling-hypothesis
24. Scaling CPU
Parallelism
Using containerized code,
you can scale CPU-
workloads needed for
persistent worlds to
thousands of virtual
backend machines in
minutes.
Source: “Scaling Containers on AWS in 2022,” Vlad Ionescu,
https://www.vladionescu.me/posts/scaling-containers-on-aws-in-2022/
25. Vast worlds may
be simulated
on-device
Some technologies in Unreal 5:
World partitioning allows open
worlds to be stitched together
Nanite allows designers to create
images of any geometry and place
it in any world.
Lumen is a ray-tracing system that
looks amazing, runs on consumer
hardware, and spares developers
from having to “bake” lighting
before each build.
Massive city-scale environment from the Unreal 5 “Matrix Awakens” demo
26. Ray Tracing in 2018: Compared to 2022:
Fortnite using Lumen on a PlayStation 5
NVIDIA RTX demo (RTX-4090 is ~$1600)
27. Generative AI for graphics
Midjourney: “concepts for a charming
sorcerer” (runtime: ~20 seconds)
2D 3D
Point-E from OpenAI: text-to-3D
28. Neural Radiance Fields (NeRF): 2D to 3D
AI generates 3D scenes and meshes generated from 2D images taken
from small number of viewpoints. The simplest way to think about NeRF’s
is that it is “inverse ray tracing,” where the 3D structure of a scene is
learned from the way light falls on different cameras.
https://www.matthewtancik.com/nerf
Applications:
• Make 3D creation accessible
to photographers – more
storytelling and virtual world
content
• Alternative to complicated
photogrammetry
30. AI could generate
entire multiplayer
worlds
Beamable demonstrated a proof-of-concept
using ChatGPT that used a natural-language
interface to bootstrap a massively-
multiplayer game backend within Unreal
Engine.
Source: https://bit.ly/MMORPG_ChatGPT
31. AI can play sophisticated
social games
• In 2022, Meta AI showed that an AI (CICERO)
could be trained on games recorded in a Web-
based Diplomacy platform
• Requires a combination of strategic reason as
well as Natural Language Processing
This hints at a future with AI that will:
• Help you work through longer, more-complex
plans like composing an entire world
• Participate “in-the-loop” of virtual
experiences and games, acting as social
collaborators and competitors
Source: https://ai.facebook.com/research/cicero/diplomacy/
32. AI can learn and
play compositional
methods
In 2022, OpenAI demonstrated through a method
called Video Pre-Training (VPT) that an AI could
learn to play Minecraft.
This resulted in the ability to perform common
gameplay behaviors – as well as compositional
activities like building a base.
This further reinforces the idea of AI-based
virtual beings that can populate worlds – as well
as act as partners in the creative process.
33. GTA V: GAN
Theft Auto
An AI was trained to generate a game based on Grand Theft
Auto by watching and then learning to play it. Source:
https://github.com/sentdex/GANTheftAuto/
34. Omniverse shows the power of real-time
compositional frameworks
Demo video: https://www.youtube.com/watch?v=EKJXI1xW4gw
Connectors to common 3D toolsets
Generative AI to bring in 3D objects
on-demand
Real-time ray tracing allows designers
to collaborate on experiences real-time
35. Beamable extends the compositional framework of 3D
Engines like Unreal and Unity to enable persistent
worlds, by integrating microservices for dynamic
economies and multiplayer systems at scale.
36. Workloads Decentralizing
Accessing zettaflops of untapped device-level compute
Past
• Most AI training in the cloud (e.g.,
foundation models)
• Most AI inference in the cloud
(e.g., ChatGPT)
• Pre-rendered graphics, “baked”
lighting, graphics optimized by
shader wizards
• Multiplayer consensus via
centralized CPU compute
workloads (walled-garden)
Future
• Personalized AI models trained on
device & federated learning
• Local, personalized AI inference
• AI models embedded into individual
products, trained/maintained by
product teams for “in-the-loop” AI
• Physics-based simulation on-device;
product teams focus on content
instead of optimization
• Decentralized consensus for more
multiplayer workloads (identity,
blockchain, containers, distributed
virtual compute)
37. Augmented Reality
transcends the
Holodeck
A feature of the Holodeck was actual
force-feedback. But beyond some simple
haptic feedback, it may not be great to
get slammed by force fields.
Unlike the holodeck, the metaverse will
infuse the real world with digital
holograms, AI-inference of the local
environment and and computation
driven by digital twins. We’ll collaborate,
play and learn in ways unbounded by
any one environment. Proof of concept from Meta Reality Labs
38. Simulations and games
can reflect physical reality
Cesium is a technology that creates a
digital-twin of the Earth you can
import into Unity and Unreal. The
above is a demo of an F-35 flight
simulator built with it.
Source:
https://bit.ly/CesiumFlightSim
39. Who will be Disrupted? a16z estimates that games will be
impacted most.
The impact will not simply be the
disruption from letting people make
the same games but cheaper – it will
be making new kinds of games with
new and smaller teams.
Many categories of traditional media
are projecting into virtual worlds,
becoming more game-like. Consider
that by January 2023, 198M people
had viewed the Travis Scott music
concert that originally appeared inside
Fortnite. All media will follow where
games will lead.
40. Experiences
Creator Economy
On-Device
Physics / Ray
Tracing
Much more efficient: e.g., less time
wasted baking, optimizing, generating
builds with ray tracing.
Worlds that look more realistic; worlds
with more dynamic interactions and
simulations. Creator-player lines blur.
Generative AI
Rapid content iteration & composition
(art, music, narratives, 3D models,
entire worlds).
“In the loop” generative AI (e.g.,
characters; translation; stories)
New game systems and use cases
Scalable
Backends (e.g.,
Microservices)
Lower-risk ability to scale backends &
logic; efficient to build/ship, easier to
setup dev machines
Larger, more detailed worlds; more
social & online products;
interconnected economies
Everything will be disrupted
41. The world that is arriving is one where we
can imagine anything – and experience
these virtual worlds alongside our friends.
The metaverse of multiverses beckons us.
And the universe said you are the universe tasting
itself, talking to itself, reading its own code
–Julian Gough, Minecraft End Poem
Jon Radoff | Metavert
https://twitter.com/jradoff
https://linkedin.com/in/jonradoff
This Composition is licensed under
Creative Commons Attribution 4.0