Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern approaches to model skinning

The State of Skinning
… Or How To Maintain Your Physique

Rulon Raymond
Sr. Engine Programmer
Introduction

1) Review
2) Evolution of techniques on console HW
3) The new hotness (hint: it’s a Clifford Algebra)
4) Extensions
DISCLAIMER: All screenshots and techniques presented
are not associated with any specific title, project, or
oragnization, unless otherwise stated.
Outline

What is Skinning?
I Was Skinning
Long Before 3D
Animated Models
Were All The Rage

 Step 1: Generate a cool animated pose.
 Step 2: ???
 Step 3: Use fancy lighting and shaders to draw an
animated model on-screen (i.e. profit)
What is Skinning?

 Step2: Skinning!
What is Skinning?

Skinned Model, ready for drawing
Model
Vertices
Bone
Weights
Bone
Transforms
What is Skinning?

𝑣′ = 𝑆𝑘𝑖𝑛(𝑊, 𝑇, 𝑣)
What is Skinning?
𝑣: The initial vertex transform
𝑊: Array of bone weighting values
𝑇: Array of bone transforms
𝑣′: The final vertex transform

• Sony Playstation (1995)
• Geometry Transform Engine (GTE)
Skinning on Consoles

• Sony Playstation2 (2000)
• Vector Unit 0 (VU0)

 Microsoft Xbox (2001)
 NVIDIA GPU (DirectX 8.x)

 Microsoft Xbox 360 (2005)
 PowerPC CPU
Skinning Implementation

 Sony PS3 ( 2006)
 Synergistic Processing Units (SPU’s)

Why not use the GPU for skinning on Xbox 360 and PS3?
The CPU’s/SPU’s are actually quite fast.
Skinning Implementation
3X
@3.2Ghz
6X
@3.2Ghz
(with many restrictions…)

Split Vertex Streams
Vertex
Position, Tangent Space
•Skinned
Colors, UV’s, etc.
•Constant – sent straight to GPU
Stream 0
Stream 1

Unified Memory Architecture
// Just skinned a vertex. Now write it out as
// three 16-byte vectors
__stvx( skinnedVertexData0, vertsOutBuffer, 0 );
// Gah – why’d that take so long?
// ~20% faster!
// (F*&^% write-combine memory)
_WriteBarrier();
_WriteBarrier();

So you can use the GPU for other things.

 Microsoft Xbox One (2013)
 Sony PS4 (2013)
 AMD GCN GPU

GPU Frame
Draw Calls
IDLE
Draw
Calls
Post FX
IDLEGCN Compute Unit
GCN Compute Unit

Async Compute Skinning
GPU Frame
Draw Calls
Skinning
Draw
Calls
Post FX
SkinningGPU Compute Unit
GPU Compute Unit

• Generate Draw
List (frame N)
Visible Models
• Async
Compute
Dispatch
Thread.
Model Skinning
Workloads • GPU rendering
(frame N-1)
Skinned Model
(frame N)
Async Compute Skinning

The standard approach to
real-time skinning, used in
almost every modern 3D
game.
Linear Matrix Blend Skinning
Suffers from some well-
documented problems...

The “candy wrapper” effect

Mesh Volume Preservation
Example: “flat ass syndrome”

Q: Why do these problems exist?
A: Let’s take a closer look at the underlying math…

𝑣′
=
𝑖=1
𝑛
𝑤𝑖 𝑀𝑗𝑖 𝑣

 Apply the property of distrubutivity:
𝑣′ = (
𝑖=1
𝑛
𝑤𝑖 𝑀𝑗𝑖)𝑣
 To keep it simple: Let 𝑀𝑗𝑖 represent a rigid transform.
 No scale, shear, …
 Most common scenario for skinning in games.

 A linear combination of rigid
transforms DOES NOT yield a
rigid transform!
 Orthonormal matrices aren’t
closed under addition.
 Scaling values can creep into
the final vertex transforms.
 Extreme cases can result in
rank-deficient matrices.
𝑣′
𝑣
𝑀𝑗1 𝑣
𝑀𝑗2 𝑣
Example: The “candy
wrapper” artifact

 The most common workaround to these issues is the addition of new
bones.
 Hand-animated or procedural.
 Split the rotation of a joint, relative to its parent, into even increments –
for a single axis only.
 Example: Arm Twist Bone
 Parented to the shoulder and consistently represents exactly half its twist(roll)
motion.

Adding these bones is not
free!
 Memory and processing
overhead.
 Exact amount depends on
actual implementation.

 Dual Quaternions to the rescue!
 But what exactly are they?
 Let’s start with a quick review of the vanilla variety of
quaternions…

 𝑞 = [𝑎, 𝑏, 𝑐, 𝑑]
 𝑞 = 𝑎 + 𝑏𝑖 + 𝑐𝑗 + 𝑑𝑘
 𝑖2 = 𝑗2 = 𝑘2 = 𝑖𝑗𝑘 = −1
 𝑞 = 𝑟, 𝑣 , 𝑟 ∈ ℝ, 𝑣 ∈ ℝ3
 𝑞 = cos
𝜃
2
, 𝑠 sin
𝜃
2
, 𝜃 ∈ ℝ, 𝑠 ∈ ℝ3
Quaternions
Hamilton - 1843
A 4D extension of complex numbers

For our purposes all we care about is unit quaternions.
 Conveniently represent rotations.
 Conjugate: 𝑞∗
= 𝑎 − 𝑏𝑗 − 𝑐𝑘 − 𝑑𝑧
Quaternions
𝑞∗
= 𝑞−1
, 𝑞 = 1

One important quaternion
equation to note:
𝑣′
= 𝑞𝑣𝑞∗
, 𝑣 = 0, 𝑥, 𝑦, 𝑧
Applies a rotation to a 3D point
Quaternions

𝑑 = 𝑎 + 𝑏𝜀, 𝜀2 = 0
Similar in form to complex numbers
Stored as: 𝑑 = 𝑎 𝑏
Dual Numbers

Conjugate
𝑑∗ = 𝑎 − 𝑏𝜀
Multiplication
𝑑0 𝑑1 = 𝑎0 + 𝑏0 𝜀 𝑎1 + 𝑏1 𝜀 = 𝑎0 𝑎1 + (𝑎0 𝑏1 + 𝑏0 𝑎1)𝜀
Dual Numbers

Basically a quaternion whose elements are dual numbers
 𝑞 = 𝑤 + 𝑖 𝑥 + 𝑗 𝑦 + 𝑘 𝑧 (quaternion form)
 𝑤 is the scalar part (dual number)
 𝑥, 𝑦, 𝑧 is the vector part (dual vector)
 𝑞 = 𝑞 𝑎 + 𝑞 𝑏 𝜀 (dual number form)
 𝑞 𝑎 : “non-dual part”
 𝑞 𝑏 : “dual part”
 Most useful for skinning.
Dual Quaternions

 Multiplication:
 𝑝 𝑞 = 𝑝 𝑎 𝑞 𝑎 + (𝑝 𝑏 𝑞 𝑎 + 𝑝 𝑎 𝑞 𝑏)𝜀
 Quaternion Conjugate:
 𝑞∗ = 𝑞 𝑎
∗ + 𝑞 𝑏
∗ 𝜀
 Dual Conjugate:
 𝑞 = 𝑞 𝑎 − 𝑞 𝑏 𝜀
 Quaternion & Dual Conjugate:
 𝑞∗ = 𝑞 𝑎
∗
− 𝑞 𝑏
∗
𝜀 = (𝑞 𝑎 − 𝑞 𝑏 𝜀)∗
Dual Quaternions

𝑁𝑜𝑟𝑚( 𝑞) = 𝑞 𝑎 +
𝑞 𝑎, 𝑞 𝑏
𝑞 𝑎
𝜀
Dual Quaternions
𝑞∗ = 𝑞−1, 𝑞 = 1

Rigid Transforms:
 𝑞 𝑟𝑜𝑡𝑎𝑡𝑖𝑜𝑛 = 𝑞 𝑎 + 0𝜀
 𝑞𝑡𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛 = (1,0,0,0) +
(0,𝑡 𝑥,𝑡 𝑦,𝑡 𝑧)
2
𝜀
 𝑞 𝑟𝑖𝑔𝑖𝑑 = 𝑞𝑡𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛 𝑞 𝑟𝑜𝑡𝑎𝑡𝑖𝑜𝑛
= 𝑞 𝑎 +
2
𝑞 𝑎 𝜀
Dual Quaternions

Transforming a 3D point
𝑣′ = 𝑞 𝑟𝑖𝑔𝑖𝑑 𝑣 𝑞 𝑟𝑖𝑔𝑖𝑑
−1
, 𝑣 = (1,0,0,0) + (0, 𝑣 𝑥, 𝑣 𝑦, 𝑣𝑧)𝜀
Dual Quaternions

Geometric Interpretation
 Recall: 𝑞 = cos
𝜃
2
+
𝑠 sin
𝜃
2
( 𝑠 = axis, θ =
𝑎𝑛𝑔𝑙𝑒)
 → 𝒒 = 𝒄𝒐𝒔
𝜽
𝟐
+ 𝒔 𝒔𝒊𝒏
𝜽
𝟐
Dual Quaternions
𝜽 = 𝜃 𝑎 + 𝜃 𝑏 𝜖 : dual quaternion
representing only a rotation
• 𝑡 =
2
: translation vector, in
quaternion form
• 𝜃 𝑎 : angle of rotation
• 𝜃 𝑏 = 𝑡, 𝑠 𝑎 : translation along 𝑠 𝑎
𝒔 = 𝑠 𝑎 + 𝑠 𝑏 𝜀 : unit dual quaternion with a
0 scalar part
• 𝑠 𝑎 = 0, 𝑠 𝑥, 𝑠 𝑦, 𝑠 𝑧 : direction of axis
of rotation
• 𝑠 𝑏 = (
1
2
( 𝑠 𝑎 × 𝑡 cot
𝜃 𝑎
2
+ 𝑡)) × 𝑠 𝑎 :
moment of rotation axis

Screw Transform!
 Rotation about an axis followed by translation along that
axis.
 All rigid transforms can be described this way.
Dual Quaternions

Simple Case:
𝐷𝑄𝐵 𝑞0, 𝑞1, t =
1 − 𝑡 𝑞0 + 𝑡 𝑞1
( 1 − 𝑡 𝑞0 + 𝑡 𝑞1)
Dual Quaternion Blend Skinning
𝑞0
𝑞1
𝑞 𝐷𝑄𝐵

𝐷𝑄𝐵 𝑞0, … , 𝑞 𝑛, 𝑤0, … , 𝑤 𝑛 =
𝑤0 𝑞0 + … + 𝑤 𝑛 𝑞 𝑛
𝑤0 𝑞0 + … + 𝑤 𝑛 𝑞 𝑛
Unlike with matrix blending, the result is
always a rigid transform!

 Very accurate, but not perfect.
 Can introduce accelerations when input dual
quaternions differ greatly.
 8.15 degrees : Maximum rotational deviation
 15.1% : Maximum translational deviation
 Modified SLERP can be used if absolute accuracy is
required.
 𝑆𝐿𝐸𝑅𝑃 𝑞0, 𝑞1, 𝑡 = 𝑞1 𝑞0
∗ 𝑡
𝑞0
 Efficiency tradeoff usually not worth it.

Must handle antipodality!
Polarity rule: 𝑞 ≡ − 𝑞
We want: ∀ 𝑞0, … , 𝑞 𝑛 ∶ 𝑞𝑖, 𝑞 𝑗 ≥ 0
Fix up all dual quaternions prior to skinning.
𝑞
− 𝑞
for ( all bones’ unit dual quaternions, dq[i] )
if ( InnerProduct( dq[i], dq[parent[i]] ) < 0.0 )
Negate( dq[i] );

// Input: unit quaternion 'q0', translation vector 't'
// Output: unit dual quaternion 'dq'
static void QuatTrans2UDQ( const float q0[4], const float t[3], float dq[2][4] )
{
// Non-Dual Part: dq[0] = q0
for ( int i=0; i<4; i++ )
dq[0][i] = q0[i];
// Dual Part: dq[1] = ((0,t[0],t[1],t[2])/2)*q0
dq[1][3] = -0.5f*(t[0]*q0[0] + t[1]*q0[1] + t[2]*q0[2]); // Scalar Component
dq[1][0] = 0.5f*( t[0]*q0[3] + t[1]*q0[2] - t[2]*q0[1]); // Vector Component 0
dq[1][1] = 0.5f*(-t[0]*q0[2] + t[1]*q0[3] + t[2]*q0[0]); // Vector Component 1
dq[1][2] = 0.5f*( t[0]*q0[1] - t[1]*q0[0] + t[2]*q0[3]); // Vector Component 2
}
Generating a Dual Quaternion

Dual Quaternion Blending
// Input: array of dual quaternions 'dqIn'
// Input: array of weights 'w‘, totaling 1.0
// Input: size of the above two arrays (> 1)
// Output: the blended dual quaternion 'dqOut'
static void DQB( const float dqIn[][2][4], float w[], int numDQ,
float dqOut[2][4] )
{
// dqOut = w[0]*dqIn[0]
Vec4Scale( dqIn[0][0], w[0], dqOut[0] );
Vec4Scale( dqIn[0][1], w[0], dqOut[1] );
for( int i = 1; i < numDQ; ++i )
{
// dqOut += w[i]*dqIn[i]
Vec4Mad( dqOut[0], w[i], dqIn[i][0], dqOut[0] );
Vec4Mad( dqOut[1], w[i], dqIn[i][1], dqOut[1] );
}
}

Transformation Using a Dual Quaternion
// Input: unit dual quaternion 'dq'
// Input: input position 'vecIn'
// Output: rigidly transformed position 'vecOut'
static void DQTransform( const float dq[2][4],
const vec3_t vecIn, vec3_t vecOut )
{
vec4_t q0, q1;
float a0, ae, recipDeLen;
vec3_t d0, de, temp1, temp2, temp3, temp4, temp5;
vec3_t temp6, temp7, temp8, temp9, temp10, temp11;
recipDeLen = 1.0f / I_sqrt( dq[0][3]*dq[0][3]
+ dq[0][0]*dq[0][0]
+ dq[0][1]*dq[0][1]
+ dq[0][2]*dq[0][2] );
// Normalize both parts of the dual quaternion, based
// on the length of the non-dual part.
Vec4Scale( dq[0], recipDeLen, q0 );
Vec4Scale( dq[1], recipDeLen, q1 );
// Isolate the scalar and vector parts of both
// quaternions. This is just for code clarity and can
// be omitted for SIMD optimization.
a0 = q0[3];
ae = q1[3];
memcpy( d0, &q0[0], sizeof( d0 ));
memcpy( de, &q1[0], sizeof( de ));
// Transform 'vecIn' by the dual quaternion
// to produce 'vecOut'. vecOut = dq*v*dq^-1
Vec3Cross( d0, vecIn, temp1 );
Vec3Mad( temp1, a0, vecIn, temp2 );
Vec3Scale( de, a0, temp3 );
Vec3Scale( d0, ae, temp4 );
Vec3Cross( d0, de, temp5 );
Vec3Sub( temp3, temp4, temp6 );
Vec3Add( temp6, temp5, temp7 );
Vec3Scale( temp7, 2.0f, temp8 );
Vec3Scale( d0, 2.0f, temp9 );
Vec3Cross( temp9, temp2, temp10 );
Vec3Add( vecIn, temp10, temp11 );
Vec3Add( temp11, temp8, vecOut );
}

0
5
10
15
20
25
30
35
Matrix Skinning (column-
major)
DQB Skinning
Instruction Counts (XB360 VMX )

0
5
10
15
20
25
30
35
Blending
(2)
Blending
(3)
Blending
(4)
Transform
Pos
Transform
Vec
Matrix Skinning (row-
major)
DQB Skinning
Instruction Counts (XB360 GPU)

On GCN GPU DQ
Skinning
Matrix Skinning
Aggregate $
Efficiency  
VGPR Count
 
Memory Stalls
 
DRAM Footprint
 

DQ vs. Matrix Skinning
DQ Skinning is ~24% faster***
***: Depends heavily on vertex layout, tangent space quality,
number of bones, and weighting distributions.

Optional Optimizations:
 Compress quaternions
 10:10:10:2 format for non-dual component
 Tune max waves/SIMD
 Generate skinning transforms on the GPU

IK
Especially when animations are
played on characters with different
or custom proportions.

Ragdolls: Can you spot all the artifacts DQB would resolve?

Pros
 GPU/SIMD friendly
 No asset changes required
 Cheaper transform blending
 More cache friendly
 Requires less memory/constants
 Conducive to procedural motions
 (Mostly) replaces the need for the
rotational split bones mentioned
earlier.
 Can be enabled selectively (per-
LOD, per-submesh, high end
machines only)
Cons
 Less intuitive than matrices
 Local scaling must be handled
separately
 Actual vertex transform is more
ALU
 Still not 100% accurate
 Potential bulge artifacts
 Not widely adopted in games (yet)
 No more flat asses!

 “Bulging-free dual quaternion skinning” (Kim, 2014)
Skinning

1. 𝐵𝑢𝑙𝑔𝑒 𝑣𝑡 = CalcBulge
𝑣0, 𝐹𝐾𝑏𝑜𝑛𝑒𝑠0, 𝐹𝐾𝑏𝑜𝑛𝑒𝑠 𝑚𝑖𝑛 ,
𝐹𝐾𝑏𝑜𝑛𝑒𝑠 𝑚𝑎𝑥, 𝑃𝑟𝑜𝑐𝑒𝑑𝑢𝑟𝑎𝑙𝐵𝑜𝑛𝑒𝑠
2. Solve for: Bone weights on 𝑣0 to
minimize 𝐵𝑢𝑙𝑔𝑒 𝑣𝑡 for all t.
3. Re-weight artists-selected vertices in
Maya/Max.
Skinning

 The optimal model skinning approach can vary per
platform.
 Give dual quaternion skinning a look.
 Don’t assume skinning is a “solved problem”.
(Unless you’re Leatherface)
Conclusion

Rulon@InfinityWard.com
Questions?

Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern approaches to model skinning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern approaches to model skinning

Similar to Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern approaches to model skinning (20)

More from Umbra Software

More from Umbra Software (9)

Recently uploaded

Recently uploaded (20)

Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern approaches to model skinning

Editor's Notes