SlideShare a Scribd company logo
1 of 114
Download to read offline
Tony Albrecht – Technical Consultant
Developer Services
Sony Computer Entertainment Europe
Research & Development Division
Pitfalls of Object Oriented Programming
Slide 2
• A quick look at Object Oriented (OO)
programming
• A common example
• Optimisation of that example
• Summary
What I will be covering
Slide 3
• What is OO programming?
– a programming paradigm that uses "objects" – data structures
consisting of datafields and methods together with their interactions – to
design applications and computer programs.
(Wikipedia)
• Includes features such as
– Data abstraction
– Encapsulation
– Polymorphism
– Inheritance
Object Oriented (OO) Programming
Slide 4
• OO programming allows you to think about
problems in terms of objects and their
interactions.
• Each object is (ideally) self contained
– Contains its own code and data.
– Defines an interface to its code and data.
• Each object can be perceived as a „black box‟.
What’s OOP for?
Slide 5
• If objects are self contained then they can be
– Reused.
– Maintained without side effects.
– Used without understanding internal
implementation/representation.
• This is good, yes?
Objects
Slide 6
• Well, yes
• And no.
• First some history.
Are Objects Good?
Slide 7
A Brief History of C++
C++ development started
1979 2009
Slide 8
A Brief History of C++
Named “C++”
1979 20091983
Slide 9
A Brief History of C++
First Commercial release
1979 20091985
Slide 10
A Brief History of C++
Release of v2.0
1979 20091989
Slide 11
A Brief History of C++
Release of v2.0
1979 20091989
Added
• multiple inheritance,
• abstract classes,
• static member functions,
• const member functions
• protected members.
Slide 12
A Brief History of C++
Standardised
1979 20091998
Slide 13
A Brief History of C++
Updated
1979 20092003
Slide 14
A Brief History of C++
C++0x
1979 2009 ?
Slide 15
• Many more features have
been added to C++
• CPUs have become much
faster.
• Transition to multiple cores
• Memory has become faster.
So what has changed since 1979?
http://www.vintagecomputing.com
Slide 16
CPU performance
Computer architecture: a quantitative approach
By John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau
Slide 17
CPU/Memory performance
Computer architecture: a quantitative approach
By John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau
Slide 18
• One of the biggest changes is that memory
access speeds are far slower (relatively)
– 1980: RAM latency ~ 1 cycle
– 2009: RAM latency ~ 400+ cycles
• What can you do in 400 cycles?
What has changed since 1979?
Slide 19
• OO classes encapsulate code and data.
• So, an instantiated object will generally
contain all data associated with it.
What has this to do with OO?
Slide 20
• With modern HW (particularly consoles),
excessive encapsulation is BAD.
• Data flow should be fundamental to your
design (Data Oriented Design)
My Claim
Slide 21
• Base Object class
– Contains general data
• Node
– Container class
• Modifier
– Updates transforms
• Drawable/Cube
– Renders objects
Consider a simple OO Scene Tree
Slide 22
• Each object
– Maintains bounding sphere for culling
– Has transform (local and world)
– Dirty flag (optimisation)
– Pointer to Parent
Object
Slide 23
Objects
Class Definition Memory Layout
Each square is
4 bytes
Slide 24
• Each Node is an object, plus
– Has a container of other objects
– Has a visibility flag.
Nodes
Slide 25
Nodes
Class Definition Memory Layout
Slide 26
Consider the following code…
• Update the world transform and world
space bounding sphere for each object.
Slide 27
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheres
Slide 28
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheres
What‟s wrong with this
code?
Slide 29
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheres
If m_Dirty=false then we get branch
misprediction which costs 23 or 24
cycles.
Slide 30
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheresCalculation of the world bounding sphere
takes only 12 cycles.
Slide 31
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheresSo using a dirty flag here is actually
slower than not using one (in the case
where it is false)
Slide 32
Lets illustrate cache usage
Main Memory L2 Cache
Each cache line is
128 bytes
Slide 33
Cache usage
Main Memory
parentTransform is already
in the cache (somewhere)
L2 Cache
Slide 34
Cache usage
Main Memory
Assume this is a 128byte
boundary (start of cacheline) L2 Cache
Slide 35
Cache usage
Main Memory
Load m_Transform into cache
L2 Cache
Slide 36
Cache usage
Main Memory
m_WorldTransform is stored via
cache (write-back)
L2 Cache
Slide 37
Cache usage
Main Memory
Next it loads m_Objects
L2 Cache
Slide 38
Cache usage
Main Memory
Then a pointer is pulled from
somewhere else (Memory
managed by std::vector)
L2 Cache
Slide 39
Cache usage
Main Memory
vtbl ptr loaded into Cache
L2 Cache
Slide 40
Cache usage
Main Memory
Look up virtual function
L2 Cache
Slide 41
Cache usage
Main Memory
Then branch to that code
(load in instructions)
L2 Cache
Slide 42
Cache usage
Main Memory
New code checks dirty flag then
sets world bounding sphere
L2 Cache
Slide 43
Cache usage
Main Memory
Node‟s World Bounding Sphere
is then Expanded
L2 Cache
Slide 44
Cache usage
Main Memory
Then the next Object is
processed
L2 Cache
Slide 45
Cache usage
Main Memory
First object costs at least 7
cache misses
L2 Cache
Slide 46
Cache usage
Main Memory
Subsequent objects cost at least
2 cache misses each
L2 Cache
Slide 47
• 11,111 nodes/objects in a
tree 5 levels deep
• Every node being
transformed
• Hierarchical culling of tree
• Render method is empty
The Test
Slide 48
Performance
This is the time
taken just to
traverse the tree!
Slide 49
Why is it so slow?
~22ms
Slide 50
Look at GetWorldBoundingSphere()
Slide 51
Samples can be a little
misleading at the source
code level
Slide 52
if(!m_Dirty) comparison
Slide 53
Stalls due to the load 2
instructions earlier
Slide 54
Similarly with the matrix
multiply
Slide 55
Some rough calculations
Branch Mispredictions: 50,421 @ 23 cycles each ~= 0.36ms
Slide 56
Some rough calculations
Branch Mispredictions: 50,421 @ 23 cycles each ~= 0.36ms
L2 Cache misses: 36,345 @ 400 cycles each ~= 4.54ms
Slide 57
• From Tuner, ~ 3 L2 cache misses per
object
– These cache misses are mostly sequential
(more than 1 fetch from main memory can
happen at once)
– Code/cache miss/code/cache miss/code…
Slide 58
• How can we fix it?
• And still keep the same functionality and
interface?
Slow memory is the problem here
Slide 59
• Use homogenous, sequential sets of data
The first step
Slide 60
Homogeneous Sequential Data
Slide 61
• Use custom allocators
– Minimal impact on existing code
• Allocate contiguous
– Nodes
– Matrices
– Bounding spheres
Generating Contiguous Data
Slide 62
Performance
19.6ms -> 12.9ms
35% faster just by
moving things around in
memory!
Slide 63
• Process data in order
• Use implicit structure for hierarchy
– Minimise to and fro from nodes.
• Group logic to optimally use what is already in
cache.
• Remove regularly called virtuals.
What next?
Slide 64
Hierarchy
Node
We start with
a parent Node
Slide 65
Hierarchy
Node
Node NodeWhich has children
nodes
Slide 66
Hierarchy
Node
Node Node
And they have a
parent
Slide 67
Hierarchy
Node
Node Node
Node Node Node NodeNode Node Node Node
And they have
children
Slide 68
Hierarchy
Node
Node Node
Node Node Node NodeNode Node Node Node
And they all have
parents
Slide 69
Hierarchy
Node
Node Node
Node Node Node NodeNode Node Node Node
A lot of this
information can be
inferred
Node Node Node NodeNode Node Node Node
Slide 70
Hierarchy
Node
Node Node
Node Node Node NodeNode Node Node Node
Use a set of arrays,
one per hierarchy
level
Slide 71
Hierarchy
Node
Node Node
Node Node Node NodeNode Node Node Node
Parent has 2
children:2
:4
:4
Children have 4
children
Slide 72
Hierarchy
Node
Node Node
Node Node Node NodeNode Node Node Node
Ensure nodes and their
data are contiguous in
memory
:2
:4
:4
Slide 73
• Make the processing global rather than
local
– Pull the updates out of the objects.
• No more virtuals
– Easier to understand too – all code in one
place.
Slide 74
• OO version
– Update transform top down and expand WBS
bottom up
Need to change some things…
Slide 75
Node
Node Node
Node Node Node NodeNode Node Node Node
Update
transform
Slide 76
Node
Node Node
Node Node Node NodeNode Node Node Node
Update
transform
Slide 77
Node
Node Node
Node Node Node NodeNode Node Node Node
Update transform and
world bounding sphere
Slide 78
Node
Node Node
Node Node Node NodeNode Node Node Node
Add bounding sphere
of child
Slide 79
Node
Node Node
Node Node Node NodeNode Node Node Node
Update transform and
world bounding sphere
Slide 80
Node
Node Node
Node Node Node NodeNode Node Node Node
Add bounding sphere
of child
Slide 81
Node
Node Node
Node Node Node NodeNode Node Node Node
Update transform and
world bounding sphere
Slide 82
Node
Node Node
Node Node Node NodeNode Node Node Node
Add bounding sphere
of child
Slide 83
• Hierarchical bounding spheres pass info up
• Transforms cascade down
• Data use and code is „striped‟.
– Processing is alternating
Slide 84
• To do this with a „flat‟ hierarchy, break it
into 2 passes
– Update the transforms and bounding
spheres(from top down)
– Expand bounding spheres (bottom up)
Conversion to linear
Slide 85
Transform and BS updates
Node
Node Node
Node Node Node NodeNode Node Node Node
For each node at each level (top down)
{
multiply world transform by parent‟s
transform wbs by world transform
}
:2
:4
:4
Slide 86
Update bounding sphere hierarchies
Node
Node Node
Node Node Node NodeNode Node Node Node
For each node at each level (bottom up)
{
add wbs to parent‟s
}
:2
:4
:4
For each node at each level (bottom up)
{
add wbs to parent‟s
cull wbs against frustum
}
Slide 87
Update Transform and Bounding Sphere
How many children nodes
to process
Slide 88
Update Transform and Bounding Sphere
For each child, update
transform and bounding
sphere
Slide 89
Update Transform and Bounding Sphere
Note the contiguous arrays
Slide 90
So, what’s happening in the cache?
Parent
Children
Parent Data
Childrens‟ Data
Children‟s node
data not needed
Unified L2 Cache
Slide 91
Load parent and its transform
Parent
Parent Data
Childrens‟ Data
Unified L2 Cache
Slide 92
Load child transform and set world transform
Parent
Parent Data
Childrens‟ Data
Unified L2 Cache
Slide 93
Load child BS and set WBS
Parent
Parent Data
Childrens‟ Data
Unified L2 Cache
Slide 94
Load child BS and set WBS
Parent
Parent Data
Childrens‟ Data
Next child is calculated with
no extra cache misses ! Unified L2 Cache
Slide 95
Load child BS and set WBS
Parent
Parent Data
Childrens‟ Data
The next 2 children incur 2
cache misses in total Unified L2 Cache
Slide 96
Prefetching
Parent
Parent Data
Childrens‟ Data
Because all data is linear, we
can predict what memory will
be needed in ~400 cycles
and prefetch
Unified L2 Cache
Slide 97
• Tuner scans show about 1.7 cache misses
per node.
• But, these misses are much more frequent
– Code/cache miss/cache miss/code
– Less stalling
Slide 98
Performance
19.6 -> 12.9 -> 4.8ms
Slide 99
• Data accesses are now predictable
• Can use prefetch (dcbt) to warm the cache
– Data streams can be tricky
– Many reasons for stream termination
– Easier to just use dcbt blindly
• (look ahead x number of iterations)
Prefetching
Slide 100
• Prefetch a predetermined number of
iterations ahead
• Ignore incorrect prefetches
Prefetching example
Slide 101
Performance
19.6 -> 12.9 -> 4.8 -> 3.3ms
Slide 102
• This example makes very heavy use of the
cache
• This can affect other threads‟ use of the
cache
– Multiple threads with heavy cache use may
thrash the cache
A Warning on Prefetching
Slide 103
The old scan
~22ms
Slide 104
The new scan
~16.6ms
Slide 105
Up close
~16.6ms
Slide 106
Looking at the code (samples)
Slide 107
Performance counters
Branch mispredictions: 2,867 (cf. 47,000)
L2 cache misses: 16,064 (cf 36,000)
Slide 108
• Just reorganising data locations was a win
• Data + code reorganisation= dramatic
improvement.
• + prefetching equals even more WIN.
In Summary
Slide 109
• Be careful not to design yourself into a corner
• Consider data in your design
– Can you decouple data from objects?
– …code from objects?
• Be aware of what the compiler and HW are
doing
OO is not necessarily EVIL
Slide 110
• Optimise for data first, then code.
– Memory access is probably going to be your
biggest bottleneck
• Simplify systems
– KISS
– Easier to optimise, easier to parallelise
Its all about the memory
Slide 111
• Keep code and data homogenous
– Avoid introducing variations
– Don‟t test for exceptions – sort by them.
• Not everything needs to be an object
– If you must have a pattern, then consider
using Managers
Homogeneity
Slide 112
• You are writing a GAME
– You have control over the input data
– Don‟t be afraid to preformat it – drastically if
need be.
• Design for specifics, not generics
(generally).
Remember
Slide 113
• Better performance
• Better realisation of code optimisations
• Often simpler code
• More parallelisable code
Data Oriented Design Delivers
Slide 114
• Questions?
The END

More Related Content

What's hot

Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 Eran Shlomo
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...Edge AI and Vision Alliance
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsHPCC Systems
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 

What's hot (6)

Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC Systems
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 

Similar to Pitfalls of Object Oriented Programming

Pitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONYPitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONYAnaya Medias Swiss
 
Pitfalls of object_oriented_programming_gcap_09
Pitfalls of object_oriented_programming_gcap_09Pitfalls of object_oriented_programming_gcap_09
Pitfalls of object_oriented_programming_gcap_09Royce Lu
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptxasdq4
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPUTomer Gabel
 
Use Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeUse Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeAlessio Coltellacci
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
Creating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the CloudCreating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the CloudAlexander Al Basosi
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMShaoshan Liu
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
 
Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Dmitry Alexandrov
 
Scaling up Machine Learning Algorithms for Classification
Scaling up Machine Learning Algorithms for ClassificationScaling up Machine Learning Algorithms for Classification
Scaling up Machine Learning Algorithms for Classificationsmatsus
 
OpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyOpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyGanesan Narayanasamy
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonAlluxio, Inc.
 

Similar to Pitfalls of Object Oriented Programming (20)

Pitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONYPitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONY
 
Pitfalls of object_oriented_programming_gcap_09
Pitfalls of object_oriented_programming_gcap_09Pitfalls of object_oriented_programming_gcap_09
Pitfalls of object_oriented_programming_gcap_09
 
JavaFX 101
JavaFX 101JavaFX 101
JavaFX 101
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
 
Use Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeUse Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient code
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Gpgpu intro
Gpgpu introGpgpu intro
Gpgpu intro
 
Creating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the CloudCreating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the Cloud
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBM
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
PyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc AltedPyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc Alted
 
Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Java on the GPU: Where are we now?
Java on the GPU: Where are we now?
 
Scaling up Machine Learning Algorithms for Classification
Scaling up Machine Learning Algorithms for ClassificationScaling up Machine Learning Algorithms for Classification
Scaling up Machine Learning Algorithms for Classification
 
OpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyOpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon Valley
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on Tachyon
 

More from Slide_N

New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiSlide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSlide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60 Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomSlide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationSlide_N
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignSlide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: TheorySlide_N
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMSlide_N
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionSlide_N
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerSlide_N
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesSlide_N
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
MLAA on PS3
MLAA on PS3MLAA on PS3
MLAA on PS3Slide_N
 
SPU gameplay
SPU gameplaySPU gameplay
SPU gameplaySlide_N
 
Insomniac Physics
Insomniac PhysicsInsomniac Physics
Insomniac PhysicsSlide_N
 
SPU Shaders
SPU ShadersSPU Shaders
SPU ShadersSlide_N
 
SPU Physics
SPU PhysicsSPU Physics
SPU PhysicsSlide_N
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Slide_N
 

More from Slide_N (20)

New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution Continues
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
MLAA on PS3
MLAA on PS3MLAA on PS3
MLAA on PS3
 
SPU gameplay
SPU gameplaySPU gameplay
SPU gameplay
 
Insomniac Physics
Insomniac PhysicsInsomniac Physics
Insomniac Physics
 
SPU Shaders
SPU ShadersSPU Shaders
SPU Shaders
 
SPU Physics
SPU PhysicsSPU Physics
SPU Physics
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Pitfalls of Object Oriented Programming

  • 1. Tony Albrecht – Technical Consultant Developer Services Sony Computer Entertainment Europe Research & Development Division Pitfalls of Object Oriented Programming
  • 2. Slide 2 • A quick look at Object Oriented (OO) programming • A common example • Optimisation of that example • Summary What I will be covering
  • 3. Slide 3 • What is OO programming? – a programming paradigm that uses "objects" – data structures consisting of datafields and methods together with their interactions – to design applications and computer programs. (Wikipedia) • Includes features such as – Data abstraction – Encapsulation – Polymorphism – Inheritance Object Oriented (OO) Programming
  • 4. Slide 4 • OO programming allows you to think about problems in terms of objects and their interactions. • Each object is (ideally) self contained – Contains its own code and data. – Defines an interface to its code and data. • Each object can be perceived as a „black box‟. What’s OOP for?
  • 5. Slide 5 • If objects are self contained then they can be – Reused. – Maintained without side effects. – Used without understanding internal implementation/representation. • This is good, yes? Objects
  • 6. Slide 6 • Well, yes • And no. • First some history. Are Objects Good?
  • 7. Slide 7 A Brief History of C++ C++ development started 1979 2009
  • 8. Slide 8 A Brief History of C++ Named “C++” 1979 20091983
  • 9. Slide 9 A Brief History of C++ First Commercial release 1979 20091985
  • 10. Slide 10 A Brief History of C++ Release of v2.0 1979 20091989
  • 11. Slide 11 A Brief History of C++ Release of v2.0 1979 20091989 Added • multiple inheritance, • abstract classes, • static member functions, • const member functions • protected members.
  • 12. Slide 12 A Brief History of C++ Standardised 1979 20091998
  • 13. Slide 13 A Brief History of C++ Updated 1979 20092003
  • 14. Slide 14 A Brief History of C++ C++0x 1979 2009 ?
  • 15. Slide 15 • Many more features have been added to C++ • CPUs have become much faster. • Transition to multiple cores • Memory has become faster. So what has changed since 1979? http://www.vintagecomputing.com
  • 16. Slide 16 CPU performance Computer architecture: a quantitative approach By John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau
  • 17. Slide 17 CPU/Memory performance Computer architecture: a quantitative approach By John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau
  • 18. Slide 18 • One of the biggest changes is that memory access speeds are far slower (relatively) – 1980: RAM latency ~ 1 cycle – 2009: RAM latency ~ 400+ cycles • What can you do in 400 cycles? What has changed since 1979?
  • 19. Slide 19 • OO classes encapsulate code and data. • So, an instantiated object will generally contain all data associated with it. What has this to do with OO?
  • 20. Slide 20 • With modern HW (particularly consoles), excessive encapsulation is BAD. • Data flow should be fundamental to your design (Data Oriented Design) My Claim
  • 21. Slide 21 • Base Object class – Contains general data • Node – Container class • Modifier – Updates transforms • Drawable/Cube – Renders objects Consider a simple OO Scene Tree
  • 22. Slide 22 • Each object – Maintains bounding sphere for culling – Has transform (local and world) – Dirty flag (optimisation) – Pointer to Parent Object
  • 23. Slide 23 Objects Class Definition Memory Layout Each square is 4 bytes
  • 24. Slide 24 • Each Node is an object, plus – Has a container of other objects – Has a visibility flag. Nodes
  • 26. Slide 26 Consider the following code… • Update the world transform and world space bounding sphere for each object.
  • 27. Slide 27 Consider the following code… • Leaf nodes (objects) return transformed bounding spheres
  • 28. Slide 28 Consider the following code… • Leaf nodes (objects) return transformed bounding spheres What‟s wrong with this code?
  • 29. Slide 29 Consider the following code… • Leaf nodes (objects) return transformed bounding spheres If m_Dirty=false then we get branch misprediction which costs 23 or 24 cycles.
  • 30. Slide 30 Consider the following code… • Leaf nodes (objects) return transformed bounding spheresCalculation of the world bounding sphere takes only 12 cycles.
  • 31. Slide 31 Consider the following code… • Leaf nodes (objects) return transformed bounding spheresSo using a dirty flag here is actually slower than not using one (in the case where it is false)
  • 32. Slide 32 Lets illustrate cache usage Main Memory L2 Cache Each cache line is 128 bytes
  • 33. Slide 33 Cache usage Main Memory parentTransform is already in the cache (somewhere) L2 Cache
  • 34. Slide 34 Cache usage Main Memory Assume this is a 128byte boundary (start of cacheline) L2 Cache
  • 35. Slide 35 Cache usage Main Memory Load m_Transform into cache L2 Cache
  • 36. Slide 36 Cache usage Main Memory m_WorldTransform is stored via cache (write-back) L2 Cache
  • 37. Slide 37 Cache usage Main Memory Next it loads m_Objects L2 Cache
  • 38. Slide 38 Cache usage Main Memory Then a pointer is pulled from somewhere else (Memory managed by std::vector) L2 Cache
  • 39. Slide 39 Cache usage Main Memory vtbl ptr loaded into Cache L2 Cache
  • 40. Slide 40 Cache usage Main Memory Look up virtual function L2 Cache
  • 41. Slide 41 Cache usage Main Memory Then branch to that code (load in instructions) L2 Cache
  • 42. Slide 42 Cache usage Main Memory New code checks dirty flag then sets world bounding sphere L2 Cache
  • 43. Slide 43 Cache usage Main Memory Node‟s World Bounding Sphere is then Expanded L2 Cache
  • 44. Slide 44 Cache usage Main Memory Then the next Object is processed L2 Cache
  • 45. Slide 45 Cache usage Main Memory First object costs at least 7 cache misses L2 Cache
  • 46. Slide 46 Cache usage Main Memory Subsequent objects cost at least 2 cache misses each L2 Cache
  • 47. Slide 47 • 11,111 nodes/objects in a tree 5 levels deep • Every node being transformed • Hierarchical culling of tree • Render method is empty The Test
  • 48. Slide 48 Performance This is the time taken just to traverse the tree!
  • 49. Slide 49 Why is it so slow? ~22ms
  • 50. Slide 50 Look at GetWorldBoundingSphere()
  • 51. Slide 51 Samples can be a little misleading at the source code level
  • 53. Slide 53 Stalls due to the load 2 instructions earlier
  • 54. Slide 54 Similarly with the matrix multiply
  • 55. Slide 55 Some rough calculations Branch Mispredictions: 50,421 @ 23 cycles each ~= 0.36ms
  • 56. Slide 56 Some rough calculations Branch Mispredictions: 50,421 @ 23 cycles each ~= 0.36ms L2 Cache misses: 36,345 @ 400 cycles each ~= 4.54ms
  • 57. Slide 57 • From Tuner, ~ 3 L2 cache misses per object – These cache misses are mostly sequential (more than 1 fetch from main memory can happen at once) – Code/cache miss/code/cache miss/code…
  • 58. Slide 58 • How can we fix it? • And still keep the same functionality and interface? Slow memory is the problem here
  • 59. Slide 59 • Use homogenous, sequential sets of data The first step
  • 61. Slide 61 • Use custom allocators – Minimal impact on existing code • Allocate contiguous – Nodes – Matrices – Bounding spheres Generating Contiguous Data
  • 62. Slide 62 Performance 19.6ms -> 12.9ms 35% faster just by moving things around in memory!
  • 63. Slide 63 • Process data in order • Use implicit structure for hierarchy – Minimise to and fro from nodes. • Group logic to optimally use what is already in cache. • Remove regularly called virtuals. What next?
  • 64. Slide 64 Hierarchy Node We start with a parent Node
  • 67. Slide 67 Hierarchy Node Node Node Node Node Node NodeNode Node Node Node And they have children
  • 68. Slide 68 Hierarchy Node Node Node Node Node Node NodeNode Node Node Node And they all have parents
  • 69. Slide 69 Hierarchy Node Node Node Node Node Node NodeNode Node Node Node A lot of this information can be inferred Node Node Node NodeNode Node Node Node
  • 70. Slide 70 Hierarchy Node Node Node Node Node Node NodeNode Node Node Node Use a set of arrays, one per hierarchy level
  • 71. Slide 71 Hierarchy Node Node Node Node Node Node NodeNode Node Node Node Parent has 2 children:2 :4 :4 Children have 4 children
  • 72. Slide 72 Hierarchy Node Node Node Node Node Node NodeNode Node Node Node Ensure nodes and their data are contiguous in memory :2 :4 :4
  • 73. Slide 73 • Make the processing global rather than local – Pull the updates out of the objects. • No more virtuals – Easier to understand too – all code in one place.
  • 74. Slide 74 • OO version – Update transform top down and expand WBS bottom up Need to change some things…
  • 75. Slide 75 Node Node Node Node Node Node NodeNode Node Node Node Update transform
  • 76. Slide 76 Node Node Node Node Node Node NodeNode Node Node Node Update transform
  • 77. Slide 77 Node Node Node Node Node Node NodeNode Node Node Node Update transform and world bounding sphere
  • 78. Slide 78 Node Node Node Node Node Node NodeNode Node Node Node Add bounding sphere of child
  • 79. Slide 79 Node Node Node Node Node Node NodeNode Node Node Node Update transform and world bounding sphere
  • 80. Slide 80 Node Node Node Node Node Node NodeNode Node Node Node Add bounding sphere of child
  • 81. Slide 81 Node Node Node Node Node Node NodeNode Node Node Node Update transform and world bounding sphere
  • 82. Slide 82 Node Node Node Node Node Node NodeNode Node Node Node Add bounding sphere of child
  • 83. Slide 83 • Hierarchical bounding spheres pass info up • Transforms cascade down • Data use and code is „striped‟. – Processing is alternating
  • 84. Slide 84 • To do this with a „flat‟ hierarchy, break it into 2 passes – Update the transforms and bounding spheres(from top down) – Expand bounding spheres (bottom up) Conversion to linear
  • 85. Slide 85 Transform and BS updates Node Node Node Node Node Node NodeNode Node Node Node For each node at each level (top down) { multiply world transform by parent‟s transform wbs by world transform } :2 :4 :4
  • 86. Slide 86 Update bounding sphere hierarchies Node Node Node Node Node Node NodeNode Node Node Node For each node at each level (bottom up) { add wbs to parent‟s } :2 :4 :4 For each node at each level (bottom up) { add wbs to parent‟s cull wbs against frustum }
  • 87. Slide 87 Update Transform and Bounding Sphere How many children nodes to process
  • 88. Slide 88 Update Transform and Bounding Sphere For each child, update transform and bounding sphere
  • 89. Slide 89 Update Transform and Bounding Sphere Note the contiguous arrays
  • 90. Slide 90 So, what’s happening in the cache? Parent Children Parent Data Childrens‟ Data Children‟s node data not needed Unified L2 Cache
  • 91. Slide 91 Load parent and its transform Parent Parent Data Childrens‟ Data Unified L2 Cache
  • 92. Slide 92 Load child transform and set world transform Parent Parent Data Childrens‟ Data Unified L2 Cache
  • 93. Slide 93 Load child BS and set WBS Parent Parent Data Childrens‟ Data Unified L2 Cache
  • 94. Slide 94 Load child BS and set WBS Parent Parent Data Childrens‟ Data Next child is calculated with no extra cache misses ! Unified L2 Cache
  • 95. Slide 95 Load child BS and set WBS Parent Parent Data Childrens‟ Data The next 2 children incur 2 cache misses in total Unified L2 Cache
  • 96. Slide 96 Prefetching Parent Parent Data Childrens‟ Data Because all data is linear, we can predict what memory will be needed in ~400 cycles and prefetch Unified L2 Cache
  • 97. Slide 97 • Tuner scans show about 1.7 cache misses per node. • But, these misses are much more frequent – Code/cache miss/cache miss/code – Less stalling
  • 99. Slide 99 • Data accesses are now predictable • Can use prefetch (dcbt) to warm the cache – Data streams can be tricky – Many reasons for stream termination – Easier to just use dcbt blindly • (look ahead x number of iterations) Prefetching
  • 100. Slide 100 • Prefetch a predetermined number of iterations ahead • Ignore incorrect prefetches Prefetching example
  • 101. Slide 101 Performance 19.6 -> 12.9 -> 4.8 -> 3.3ms
  • 102. Slide 102 • This example makes very heavy use of the cache • This can affect other threads‟ use of the cache – Multiple threads with heavy cache use may thrash the cache A Warning on Prefetching
  • 103. Slide 103 The old scan ~22ms
  • 104. Slide 104 The new scan ~16.6ms
  • 106. Slide 106 Looking at the code (samples)
  • 107. Slide 107 Performance counters Branch mispredictions: 2,867 (cf. 47,000) L2 cache misses: 16,064 (cf 36,000)
  • 108. Slide 108 • Just reorganising data locations was a win • Data + code reorganisation= dramatic improvement. • + prefetching equals even more WIN. In Summary
  • 109. Slide 109 • Be careful not to design yourself into a corner • Consider data in your design – Can you decouple data from objects? – …code from objects? • Be aware of what the compiler and HW are doing OO is not necessarily EVIL
  • 110. Slide 110 • Optimise for data first, then code. – Memory access is probably going to be your biggest bottleneck • Simplify systems – KISS – Easier to optimise, easier to parallelise Its all about the memory
  • 111. Slide 111 • Keep code and data homogenous – Avoid introducing variations – Don‟t test for exceptions – sort by them. • Not everything needs to be an object – If you must have a pattern, then consider using Managers Homogeneity
  • 112. Slide 112 • You are writing a GAME – You have control over the input data – Don‟t be afraid to preformat it – drastically if need be. • Design for specifics, not generics (generally). Remember
  • 113. Slide 113 • Better performance • Better realisation of code optimisations • Often simpler code • More parallelisable code Data Oriented Design Delivers