Object-based audio improves spatial accuracy by assigning sounds as independent audio objects that can be precisely positioned in 3D space, rather than assigning them to fixed speaker channels. This allows for maximum spatial precision and flexibility to author once and have sounds adapt to different output configurations. Key benefits of object-based audio include improved spatial accuracy, leveraging of hardware capabilities through object rendering, and authoring once for all output configurations.
5. HRTF (Head-Related Transfer Function)
HRTF models how a given sound is filtered
by the diffraction and reflection properties
of the head, pinna, and torso before it
reaches the eardrum and inner ear
Varies significantly from person to person
16. What is a System
Audio Object?
An audio buffer accompanied by Metadata.
Position, Distance, Azimuth, Elevation,
Focus, Spread
!!! NOT to be confused with Game Objects
21. Platform Channel Mix and System Audio Object Limits
Format Max Static Objects
(Channel Bed)
*Can be Spatialized
Max Dynamic
Objects
PS5
Max Dynamic
Objects
Xbox Series X|S, UWP
apps & >=2303 GDK
Max Dynamic
Objects
Xbox Series X|S, XDK
& <2303 GDK
Max Dynamic
Objects
Xbox One
PS5 3D Audio 7.1/ Ambisonic 5th Order* 128
PS5 Non-3D Audio 7.1/ Headphones N/A
Windows Sonic (Headphones) 17 (8.1.4.4)* 220 20 15
Dolby Atmos (Headphones) 17 (8.1.4.4)* 128 20 16
DTS Headphone:X (Headphones) 17 (8.1.4.4)* 200 20 16
Dolby Atmos Home Theater (HDMI) 12 (7.1.4)* 20 20 20
DTS:X for Home Theater (HDMI) 17 (8.1.4.4)* 20 20 16
Support for Spread
Time Sync Across Static/ Dynamic Objects
Hardware Spatial Audio Capabilities - Resource Limits
24. Delivering the "best" mix
"Best possible way to ensure a good mix is to audition
it"
● Stereo
● 5.1
● 7.1.4
Rely on the adaptability of your sound engine …
… to deliver the best mix for the listening
configuration.
26. Design / Authoring Configurations Auditioning Configurations
etc...
Different modes of listening
27. Designed for the Best Possible Audio
Output Configurations
Dynamically Conformed to the Endpoint
Informed by Authoring
Simplified Complexity
28. Object-based Audio
It's a way to deliver audio, along with their metadata, to an endpoint.
Benefits
● Best possible spatial precision
● Author once for all outputs
● Can be hardware accelerated
● Opens the door to HRTF
Considerations
● HRTF/Binauralization will "color" your sound.
Some sounds might be best represented as NOT a Audio Object.
● Should you do both a Binaural and non-binaural mix?
If a system doesn't have Audio Objects, sound engine will/should just fall-back on channel based.
30. Wwise Audio Pipeline
Inside Wwise Endpoint
User Defined
Output
Configuration
Audio Objects
Passthrough Mix
Main Mix
Audio
Configuration
31. Wwise Spatial Audio Pipeline Overview
SFX
Music
Passthrough
Channel-based
(ex. Stereo)
Spatial Bed
(Virtualized)
Channel-based
(ex. Ambisonics)
Inside Wwise Endpoint
Audio
Objects
DSP
Music
SFX
VO
SFX
Amb
User Defined
Output
Configuration
● Speakers
● Headphones
● Spatialization
… and
Initializes Wwise
To match the
Configuration
Audio Objects
(+ Metadata)
VO
Audio
Objects
Passthrough
Mix
Main
Mix
Master
Audio
Bus
Audio Device
Mix
DSP
37. Main Mix
https://www.audiokinetic.com/library/edge/?source=Help&id=system_audio_device
Automatic output determination
If not
Passthrough Mix
It has a mono or stereo channel configuration.
It does not have a 3D position.
Wwise will help determine output type
Audio Object
It has a 3D position.
Its Speaker Panning / 3D Spatialization Mix is set to 100%.
Has a standard channel configuration* that does not have any height channels.
Audio Object would not exceed the number of available Audio Objects.
If not
38. Use Default
Mix to Main
Mix to Passthrough
Manual Output Assignment using Metadata
45. How do I prioritize which sounds
become Audio Objects?
46. Audio/ Auxiliary Bus: 3D Audio Bed Mixer
● Can reduce the number of Audio Objects passing through the bus
● Mixes Audio Objects over the defined limit, depending on the
behavior settings
51. Same as parent
Audio Objects
Same as main mix
Same as passthrough mix
Bus Configurations
52. Format Channel Mix Audio Objects
PS5™
3D Audio 7.1/ Ambisonic 5th Order Many
Non-3D Audio 7.1/ Headphones N/A
Windows 10 Xbox One HoloLens
Windows 10
Xbox One
HoloLens
*LFE Not Counted
Windows Sonic for Headphones 16 (8.1.4.4) 112 16 31
Dolby Atmos (Headphones & Built-in Speakers) 16 (8.1.4.4) 16 16 N/A
Dolby Atmos (HDMI) 12 (7.1.4) 20 20 N/A
DTS Headphone:X (Headphones) 16 (8.1.4.4) 32 32 N/A
* iOS / Android TBD
https://learn.microsoft.com/en-us/windows/win32/coreaudio/spatial-sound
Hardware Acceleration & Capabilities
53. Override
Default
100’s of Objects 15 Objects
Optimizing Bus Configuration to account for differences
in the availability of System Audio Objects across platforms
Per-Platform Changes: Bus Configuration
55. Windows Sonic
Included with Windows
https://support.microsoft.com/en-
us/windows/how-to-turn-on-spatial-sound-in-
windows-10-ca2700a0-6519-448d-5434-
56f499d59c96
Free and already installed
DOLBY ATMOS
Get with Dolby Access
https://apps.microsoft.com/store/detail/dolby-
access/9N0866FS04W8?hl=en-us&gl=us
What's Dolby Atmos?
https://youtu.be/XfSj4wIcLIY
Dolby Atmos + Wwise
https://games.dolby.com/atmos/wwise/
Paid plugin. Only supports 16 Audio Objects,
but working on extending it.
HRTF renderers on Windows
DTS
Get with DTS Sound Unbound
https://apps.microsoft.com/store/detail/dts-
sound-unbound/9PJ0NKL8MCSJ?hl=en-
us&gl=us
Paid plugin + Most expensive. Only supports
up to 32 Audio Objects.
* Mac not yet supported by Wwise.
56. Audio/ Auxiliary Bus: Effects
Effect Plug-ins can use Audio Objects Metadata for processing
● Effects can modify Audio Object configurations
● Audio Objects can be gathered and signals processed per-object
● Example: Wwise Compressor
○ Audio Objects are gathered and evaluated as a group
○ Volume offset is calculated and applied to each Audio Object
○ Metadata is preserved
Audio Object
Audio Object
Audio Object
Mix and Calculate
Volume Reduction
Audio Bus: Wwise Compressor
DSP
Audio Object
Audio Object
Audio Object
Gather Audio Objects
Apply Volume Offset
To Each Audio Object
57. Not compatible Effects
(Demonstrated in Wwise)
The following Effects are not supported by busses that are Processing Audio Objects:
● Wwise Convolution Reverb: Running one instance of the Effect for each Audio Object would cause
performance issues.
● Wwise Matrix Reverb: Running one instance of the Effect for each Audio Object would cause performance
issues.
● Wwise RoomVerb: Running one instance of the Effect for each Audio Object would cause performance
issues.
● Wwise Peak Limiter: Peak limiting at the Audio Object level would be unreliable. When authoring Audio
Objects, use the Mastering Suite plug-in on an Audio Device to apply peak limiting.
● Wwise Recorder: The Recorder cannot run multiple instances.
● Auro Headphone: Not supported.
https://www.audiokinetic.com/library/edge/?source=Help&id=using_effects_with_audio_objects#ef
fects_on_mixing_bus
58. Bus Instance Object Processors
(Demonstrated in Wwise)
Certain Wwise Effects support Audio Objects intrinsically. Such Effects are called Object Processors and are
instantiated only once per bus instance:
● Wwise Compressor: The Compressor is instantiated once and performs the analysis phase once on an
internal downmix. The gain reduction is common to all Audio Objects.
● Mastering Suite: Multiband compressor works a bit like compressor.
● Wwise Meter: Analysis 1x internal downmix.
● Wwise Reflect: Set Audio Objects > 1x Audio Object per Reflection.
https://www.audiokinetic.com/library/edge/?source=Help&id=using_effects_with_audio_objects#effects_on_mixin
g_bus
59. Wwise Routing - Defined by the Bus
Bus
Configuration
Master
Audio
Bus
3D
Audio
Active
7.1.4
2
Audio
Device
Audio Object
Metadata: Default
Audio Object
Audio Object
Metadata: Same as Main Mix
Metadata: Default
Endpoint
Wwise
Any audio object without
positioning will end up in the
Main Mix or Passthrough
Same as
Parent
60. Wwise Routing - Defined by the Bus
Bus
Configuration
Master
Audio
Bus
3D
Audio
Active
7.1.4
2.0
1
Audio
Device
Audio Object
Metadata: Default
Audio Object
Audio Object
Metadata: Same as Passthrough
Metadata: Same as Main Mix
Endpoint
Wwise
Same as
Parent
Editor's Notes
Many thanks to DevGAMM for inviting me to talk.
My name is Mads, pronounced without saying the D, and I work at Audiokinetic.
Today I'll be talking about How to Improve Spatial Accuracy of your mix using Audio objects, which is also referred to as …
Object-based Audio
Binaural Audio
3D Audio
HRTF
or sometimes Spatial Audio, even though that might also be something quite different.
There are minor differences in between the terms, but in general we'll refer to this as either Object-based Audio or that you can send Audio Objects.
So "why" do you have to think about Object-based Audio?
Technology is constantly evolving … and before we know it … something like Object-based Audio could easily be the norm.
Especially if you've got a couple of years ahead on your current game production, then by the time your game is released, many might EXPECT your game to feature 3D audio.
Just to make sure everyone knows the terms 3D audio and HRTF, then…
3D Audio is a way to filter sounds so they sound like they come from above, behind, etc. and for calculating how this filtering should be applied…
… an Head related transfer function is calculated, for instance based on a head like this. Therefore, sounds are filtered, to simulate the effect of body properties, so sounds HRTF'ed will sound much more like they were NOT on top of your head, but in the real world.
And something like 3D audio seems to soon become the norm.
"How many of you have experienced 3D audio?
3D audio is available in consoles, computers… even the small Apple in-ears you see everywhere have "Spatial Audio" mode, which is basically 3D audio.
So whether you learn about it now or later, you'll probably have to know / account for it.
3D audio is not really a new thing… BUT now it's much more accessible.
Hardware support
Adaption
Tool support
E.g. in Wwise you've been able to do this for years, but in 21.1 it was made much easier to setup and more importantly PROFILE. Troubleshooting what's wrong and why.
Why is 3D audio getting so popular? We'll who doesn't have a pair of headphones? and it seems like there will be much more headphones out there in the future.
So just with a pair of headphones, and maybe with a personal HRTF, which could be used IN 3D Audio.
We like to refer to this as Object-based Audio …
… and I'll tell you more about the differentiation later …
… but let's start by going through the benefits and potential problems.
First, and maybe most important of all, is it's possible to maximize Spatial Accuracy.
To explain this … I want to show you a hypothetical problem.
Let's consider a hypothetical problem.
1x 4 channel ambience
1x 1 channel bird sound positioned in space.
With channels, the bird would simply be mixed into the channels closest to the sound position.
What's the problem?
The Problem is that the 3d positioned sound is tied to the channel mix.
For example, if you choose to move one speaker …
… the sound would now come from back right, instead of front right.
The problem here is not about speaker positions, but that this is ALSO what you'll deliver to 3D Audio.
… 3D Audio, the HRTF renderer ONLY have the channel mix to apply filters on.
So the front right channel will get filtered as being front right … and the back right channel will get filtered as back right.
At this stage, it's impossible for HRTF to untangle the sound from the channels for HRTF calculations.
It might provide OK results, but the spatial accuracy will be limited to the channel count.
Imagine this… what if we placed a separate speaker at the birds position? Then the sound would not be dependent on your channel mix, but always be at the exact position it needs to be.
So when sending this separate speaker to the HRTF rendering, it will be able to MUCH more precisely represent it, no matter the channel mix.
This is also how you should think about the idea of 3D audio. Maybe keep sending the channel mix, but for the more precise sounds, you send them separately outside of the HRTF rendering.
But what is an Audio Object > read slide.
So basically, it's a sound with some information on it, and once it reaches the HRTF renderer, this will get consumed.
Positioned in this direction.
At that height.
With that much spread.
There's been many technological advancements recently to bring you 3D Audio…
Yes I will help you binauralize your audio… but technically it's a separate decoder you can send your sounds to … and it will process these sounds.
Without it, we'd have to use the CPU to decode and handle sounds … that constant struggle with other teams on your game development that also use CPU processing … but with a separate decoder, you get dedicated processing of your sounds.
Both of these consoles have audio decoders which we can send sound to be processed at … but not only audio objects … but also channel mix and more.
Back to Audio object - how many can we send to this audio coder?
Should you not use any if there's other vital sounds? use priorities to define.
Doubling numbers in audio objects since last year.
And this is of cause something you tightly monitor in your sound engine tool.
Another benefit would be the ability to author once for all outputs.
Let's say you're in the final stage of development and you need to do a mix of the game.
Well … Best possible way to ensure a good mix is to audition it …
… but having to redo a mix for every output configuration is not always an option …
… so it's not uncommon to just mix with the highest resolution (7.1.4) and then "Rely on the adaptability…"
7.1.4 is not the best for all outputs, cause it cannot use object-based audio.
So in other words, many are simply mixing for the highest channel configuration and then "cross fingers" on it downscaling well.
But now that we have object-based audio we can use an even higher "resolution", with just the use of a pair of headphones.
To better understand it, let's compare it to image file-types.
A JPG is a fixed resolution, just like channels, but no matter what size, it will always be the same resolution. Here the small dot is ALSO pixels, but you don't notice it.
For larger formats like posters, you might instead use vector images, which can be scaled up infinitely, so no more pixelation when making huge wall prints or posters.
Audio Objects are kinda like vector images. No matter the output resolution, it will always look sharp.
Therefore the takeaway is … Object-based Audio allows you to deliver the best possible mix for a wide range of listening configurations.
So if you are using 3D audio, then you'd already use one of the high resolution for spatial precision, which should translate well into whatever other output configuration might be used.
HRTF will affect your sounds…
Low end is filtered out
Filters introduce
Coloration
Phase shifting
Ask yourself, what's more important? the interactive or aesthetic fidelity
Should you do a binauralized mix and one that is not?
Well… the technology is moving rapidly towards having these kind of 3D audio on by default.
Take apple in-ears. How many of you do actually "disable" the spatial audio feature?
Enough conceptual discussion, let's talk about how it's setup … and here I'll use Wwise, which has clearly defined paths and profiling tools …
… but remember that Wwise has been adapted for the technology … so the differentiation in objects or channels are not exclusive to Wwise.
In Wwise, 3x different paths are defined.
When the game starts … Wwise is informed of the endpoint’s audio configuration …
… and dynamically configures its output in-accordance.
Let's have a deeper look at how to use the pipeline.
Passthrough: Channel based + stereo. Most commonly used for Music.
Spatialized bed: For channel based stuff that doesn't need to become audio objects. Like an ambience, where channels are spread around you.
Audio Objects: For the sounds that needs to use the best possible sound accuracy.
Both Spatialized bed and Audio Objects will get HRTF'ed in 3D Audio.[MAYBE SHOW IN VISUALS]
As an exercise, let's create a soundscape of sounds.
First, on top of you … with no positioning and not spatialized … you can send sounds to the passthrough mix. This would most likely be music or UI sounds as they don't need to be spatialized, but rather just play the channels in 2D, as they were exported from your DAW.
Then on the outside, we add the more diffuse sounds for colouring the soundscape, like a campfire, ambience from woodlands, or birds.
And in between we assign the more vital sounds, or important to locate in space, to Audio Objects … so we ensure whatever the output is, we represent those sounds with the best possible accuracy.
This might seem like a lot of additional work … but if you are using a sound engine like Wwise … it will automatically determine the "proper" output.
We don't want to get too much into this … but basically … Wwise will see whether a sound would be ideal to have as a audio object, passthrough or main mix.
However, if you don't agree on this, you can add Metadata to override this.
Here's an example of a metadata added to a sound … then you choose what output it should have … and then this information will carry on to the end-point to be consumed there.
What might be a very important detail is that 3D Audio does not necessarily include Audio Objects.
Some will use 3D audio on channel based alone … for instance, send a ambisonics channel mix to the HRTF rendering and that's it.
Both channels (main mix) and Audio objects are HRTF'ed, but of course you get the highest precision by using Audio Objects.
That said, binaural processing is limited to the channel count as main mix.
What you're about to see is…
Wwise connected to Unreal sample game
Object-based Audio (Windows Sonic) turned on
Wwise Audio Lab
Go to campfire
Zoom > System Audio Device
Only main mix populate
Turn on Windows Sonic
All meters populate
Audio Object 3D Viewer
Zoom-in
Turn: Directionality of sounds
Click object > sphere
Sending 200 sounds, which of the 111 will become audio objects?
First-come, first-serve basis. BUT…
Don't limit yourself.
Should you not use any if there's other vital sounds? use priorities to define.
Doubling numbers in audio objects since last year.
Should you want to try these examples our yourself, you can grab WAL from the Audiokinetic Launcher.
For 3D audio, you can always use Windows Sonic.
Questions?
Bus configurations can be set on Audio and Aux busses to optimize the sources routed to a bus.
The “Same as parent” configuration indicates that the Bus will inherit the Bus Configuration of its parent in the hierarchy. The bus configuration of the Master Audio Bus is implicitly set to Parent, because it inherits the bus configuration of the associated audio output device.
The “Same as Main Mix” & “Same as Passthrough Mix” indicate the that Bus will inherit the Bus Configuration of of the initialized Main or Passthrough Mix
Additional Bus Configurations include the usual channel-based configurations, including: 2.0, 5.1, 7.1, 7.1.4, up to 5th Order Ambisonic and many more.
If an Audio or Aux Bus is set to anything other than Parent or Audio Objects, it forces a submix at this level of the hierarchy and will consume any Audio Object Metadata.
(Also see: Understanding the Voice Pipeline and Understanding Bus Configurations)
Per-platforms changes can be made across many of the properties in Wwise, including Bus Configurations which allows for optimizations to adjust to platform specific functionality.
Starting in Wwise 2021.1 effects no longer require mixing and process audio objects individually
Additionally, effects can use Audio Object Metadata for processing.
Effects can modify Audio Object configurations
Audio Objects can be gathered and signals processed per-object
Example: Wwise Compressor
Audio Objects are gathered and evaluated as a group
Volume offset is calculated and applied to each Audio Object
Metadata is preserved
In this example, All Audio Objects are routed to an Audio Bus with a “Same as Parent” Configuration.
2 Audio Objects and their Metadata will be preserved to the Endpoint and positioned binaurally as System Audio Objects.
1 Audio Object has Wwise System Output Settings Metadata added with the Mix Behavior of “Same as Main Mix” and will be mixed to the 7.1.4 configuration and then output to the endpoint where it will be virtualized in a 7.1.4 configuration and binauralized.
In this example, All Audio Objects are routed to an Audio Bus with a “Same as Parent” Configuration.
1 Audio Objects and its Metadata will be preserved to the Endpoint and positioned binaurally as System Audio Objects.
1 Audio Object has Wwise System Output Settings Metadata added with the Mix Behavior of “Same as Main Mix” and will be mixed to the 7.1.4 configuration and then output to the endpoint where it will be virtualized in a 7.1.4 configuration and binauralized.
1 Audio Object has Wwise System Output Settings Metadata added with the Mix Behavior of “Same as Passthrough Mix” and will be mixed to the 2.0 configuration set by the Endpoint for the Passthrough Mix that will remain unfiltered.
Making decisions about what sounds or parts of a sound should remain unfiltered is often dependent on the content and intention of the interactive scenario.
Preserving Metadata, including 3D position, to the Audio Device without destroying it through the process of mixing is essential. An Audio Object without a 3D position will be mixed to the Main Mix by the Audio Device.