DreamWorks Animation

Intel® Software
Intel® SoftwareIntel Software at Intel® Software
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
DreamWorks Animation*:
Slashing the cost of 3d Matrix
Math using X-Form
(Transform) Building Blocks
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
DreamWorks Animation*:
Slashing the cost of 3d Matrix
Math using X-Form
(Transform) Building Blocks
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
DreamWorks Animation:
Slashing the cost of 3d Matrix
Math using X-Form
(Transform) Building Blocks
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Alex Wells (presenter)
& Martin Watt (DWA)
August 12 & 13, 2015
DreamWorks Animation:
Slashing the cost of 3d Matrix
Math using X-Form
(Transform) Building Blocks
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS
DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO
SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER
INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION
CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS
COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH
MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves
these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this
information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products.
Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar
performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.
Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other
platforms, and assigning them a relative performance number that correlates with the performance improvements reported.
SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjbb, SPECjvm, SPECWeb, SPECompM, SPECompL, SPEC MPI, SPECjEnterprise* are trademarks of the Standard Performance Evaluation Corporation. See
http://www.spec.org for more information. TPC-C, TPC-H, TPC-E are trademarks of the Transaction Processing Council. See http://www.tpc.org for more information.
Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and
software you use. For more information including details on which processors support HT Technology, see here
Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration.
Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost
No computer system can provide absolute security. Requires an enabled Intel® processor and software optimized for use of the technology. Consult your system manufacturer and/or software vendor for more information.
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to:
Learn About Intel® Processor Numbers
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and
products specified are for planning purposes only and are subject to change without notice
*Other names and brands may be claimed as the property of others.
Legal Disclaimers
5
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that
involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations
identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many
factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those
expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the
company’s expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of
Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes
in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to
negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by
a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross
margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors,
including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to
technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity
utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the
timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of
materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's
results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including
military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain
marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of
revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with
product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust,
disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an
injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or
requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in
Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release.
Risk Factors
6
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
7
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Before
 After
Overall Speedup 1.2x
8
DWA* Character Animation
Speedup After XBB
Motion System
Speedup 1.6x
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Motion System in DWA Character Animation
 Observed performance bottlenecks in Motion System
 3d Matrix transforms
 How would an ideal transform behave
 XBB representation
 XBB deferred evaluation
 Results
Agenda
9
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 To represent bones of a skeleton in 3d space an
animation tool builds a Hierarchy of Joints and how
they are connected.
– Typically a Directed Acyclic Graph of Joints
How is a skeleton represented for
animation?
10
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Relative to a parent Joint (in Local Space), each Joint
needs to model:
– Rotational Euler Angles(around X, Y, and Z axis) & Order
– Scale (of X, Y, and Z axis)
– Shear (along X, Y, and Z axis)
– Translation (X, Y, and Z components)
 Animation curves change values over time
– drive the Joint’s attributes (rotation, translation, etc.)
How is a each Joint represented?
11
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Deformers which compute the final 3d vertices of a
character’s skin need an “Frame” of reference to apply
offsets from.
 The “World Space” Position and Orientation of the Joints
from the Hierarchy (skeleton) provide that “Frame” of
reference.
How does the skeleton influence the
skin?
12
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Representing a “Frame” of reference
struct Matrix4x4
{
double m[4][4];
};
 A 4x4 Matrix can represent the Position and Orientation of a
Joint in World Space.
 When used in this manner, the 4x4 Matrix is commonly
referred to as a 3d transform (x-form).
 4x4 Matrix is typically implemented literally as a 4x4 array of
floating point values.
13
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Rotation, Scale, Shear, and Translation can all be
represented as 4x4 Matrices.
 Multiple 4x4 Matrices can be concatenated (multiplied)
together to a single 4x4 matrix.
 3d points and 3d vectors (offsets) can be multiplied through
a 4x4 Matrix to be transformed to the position and
orientation in “World Space” it represents.
 For each Joint
– matrices representing Scale, Shear, Rotation, and Translation are
combined together into a single “Local Space” 4x4 matrix.
Why a 4x4 Matrix?
14
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 By recursively combining the “Local Space” transforms of a Joint
with its parent Joint’s “Local Space” until the root of the hierarchy
is reached, a 4x4 matrix can be accumulated that represents the
World Space of that Joint.
 As there are many joints, its pays off to cache a “World Space” 4x4
Matrix at each joint, so that a recursive walk up the hierarchy can
stop early if a clean “World Space” has been cached.
How To Calculate The World Space
Transform Of A Joint?
15
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Each time step, 1000’s of Joint attributes change,
invalidating a Hierarchy’s cached World Space and
Local Space transforms.
 1000’s of operations on Hierarchy objects build up a
complex skeleton.
Hierarchy is the core of
DWA’s Motion System
 Imagine how many bones are used to
represent a 4 legged creature with a
tail & wings.
 Due to the recursion, there is little
opportunity for data vectorization or
threading.
16
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Despite heavy parallelization of the Deformation System (green & yellow), it
can’t start until the Motion System (red) finishes assembling a Hierarchy.
Motion System Is On The Critical Path
17
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Motion System dwarfs the
other systems.
 Amdahl’s law limits our
threading & vectorization
improvements in the
deformation system from
having a larger overall
impact.
Wall Time Spent in Each Category
18
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 “hier_apply_fk_around_pivot”
as the hottest operator
– Operates on a Hierarchy
– Verified in Intel® VTune™
Amplifier XE
 Several other “hier” related
operations taking up other
top hot spots.
Time Spent inside each type of Operator
19
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Typical implementation
– Loop over rows
– Loop over colums
– Compute result element by
multiplying one row of first matrix
across one column of the other
 Simple enough, but how much
work did we really just do?
struct Matrix4x4
{
double m[4][4];
};
20
Matrix4x4 operator * (const Matrix4x4 &iOther)
{
Matrix4x4 result;
for (int r=0;r < 4; ++r)
{
for (int c=0;c < 4; ++c)
{
double sum = 0.0;
for(int k=0; k < 4; ++k)
{
sum += m[r][k]*iOther.m[k][c];
}
result.m[r][c] = sum;
}
}
return result;
}
Matrix Concatenation (Multiplication)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 64 Multiplies (double precision)
 48 Additions (double precision)
Expensive Matrix Concatenation
Matrix4x4 operator * (const Matrix4x4 &iOther)
{
Matrix4x4 result;
result.m[0][0] =
m[0][0]*iOther.m[0][0] +
m[0][1]*iOther.m[1][0] +
m[0][2]*iOther.m[2][0] +
m[0][3]*iOther.m[3][0];
result.m[0][1] =
m[0][0]*iOther.m[0][1] +
m[0][1]*iOther.m[1][1] +
m[0][2]*iOther.m[2][1] +
m[0][3]*iOther.m[3][1];
result.m[0][2] =
m[0][0]*iOther.m[0][2] +
m[0][1]*iOther.m[1][2] +
m[0][2]*iOther.m[2][2] +
m[0][3]*iOther.m[3][2];
result.m[0][3] =
m[0][0]*iOther.m[0][3] +
m[0][1]*iOther.m[1][3] +
m[0][2]*iOther.m[2][3] +
m[0][3]*iOther.m[3][3];
result.m[1][0] =
m[1][0]*iOther.m[0][0] +
m[1][1]*iOther.m[1][0] +
m[1][2]*iOther.m[2][0] +
m[1][3]*iOther.m[3][0];
result.m[1][1] =
m[1][0]*iOther.m[0][1] +
m[1][1]*iOther.m[1][1] +
m[1][2]*iOther.m[2][1] +
m[1][3]*iOther.m[3][1];
result.m[1][2] =
m[1][0]*iOther.m[0][2] +
m[1][1]*iOther.m[1][2] +
m[1][2]*iOther.m[2][2] +
m[1][3]*iOther.m[3][2];
result.m[1][3] =
m[1][0]*iOther.m[0][3] +
m[1][1]*iOther.m[1][3] +
m[1][2]*iOther.m[2][3] +
m[1][3]*iOther.m[3][3];
result.m[2][0] =
m[2][0]*iOther.m[0][0] +
m[2][1]*iOther.m[1][0] +
m[2][2]*iOther.m[2][0] +
m[2][3]*iOther.m[3][0];
result.m[2][1] =
m[2][0]*iOther.m[0][1] +
m[2][1]*iOther.m[1][1] +
m[2][2]*iOther.m[2][1] +
m[2][3]*iOther.m[3][1];
result.m[2][2] =
m[2][0]*iOther.m[0][2] +
m[2][1]*iOther.m[1][2] +
m[2][2]*iOther.m[2][2] +
m[2][3]*iOther.m[3][2];
result.m[2][3] =
m[2][0]*iOther.m[0][3] +
m[2][1]*iOther.m[1][3] +
m[2][2]*iOther.m[2][3] +
m[2][3]*iOther.m[3][3];
result.m[3][0] =
m[3][0]*iOther.m[0][0] +
m[3][1]*iOther.m[1][0] +
m[3][2]*iOther.m[2][0] +
m[3][3]*iOther.m[3][0];
result.m[3][1] =
m[3][0]*iOther.m[0][1] +
m[3][1]*iOther.m[1][1] +
m[3][2]*iOther.m[2][1] +
m[3][3]*iOther.m[3][1];
result.m[3][2] =
m[3][0]*iOther.m[0][2] +
m[3][1]*iOther.m[1][2] +
m[3][2]*iOther.m[2][2] +
m[3][3]*iOther.m[3][2];
result.m[3][3] =
m[3][0]*iOther.m[0][3] +
m[3][1]*iOther.m[1][3] +
m[3][2]*iOther.m[2][3] +
m[3][3]*iOther.m[3][3];
return result;
}
21
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Good news! YES!
 If you knew the exact transform a 4x4 matrix was
representing, you would know quite a few 0 and 1
values at compile time.
Are Any of Those 16 Matrix Values Known
At Compile Time?
Identity
[1][0][0][0]
[0][1][0][0]
[0][0][1][0]
[0][0][0][1]
Translation(x,y,z)
[1][0][0][0]
[0][1][0][0]
[0][0][1][0]
[x][y][z][1]
Shear(x,y,z)
[1][0][0][0]
[x][1][0][0]
[y][z][1][0]
[0][0][0][1]
Scale(x,y,z)
[x][0][0][0]
[0][y][0][0]
[0][0][z][0]
[0][0][0][1]
22
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Building rotation matrices is more expensive because of the need
to call sine and cosine on the angle
 Rotations also have 0 and 1 values
What About Rotations?
Rotate X axis(angle)
[1][0][0][0]
[0][c][s][0]
[0][-s][c][0]
[0][0][0][1]
Rotate Y axis(angle)
[c][0][-s][0]
[0][1][0][0]
[s][0][c][0]
[0][0][0][1]
Rotate Z axis(angle)
[c][s][0][0]
[-s][c][0][0]
[0][0][1][0]
[0][0][0][1]
23
let s = sine(angle)
let c = cosine(angle)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Unfortunately, the matrix multiply method doesn’t
know that the 4x4 Matrix it was passed has any 0 or 1
values
– So it can not avoid performing math operations.
 Even if we had separate classes to represent the
different transformations and multiple versions of the
matrix multiply method for each
– The result becomes a general 4x4 matrix.
– Chains of multiplication would only benefit on the 1st multiply
operation
Huge Optimization Potential!
24
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Pseudo algorithm to compute a Joint’s World Space
– 10 4x4 matrix multiplications
– 1 matrix inversion (very expensive) in the middle
 YES… But you won’t even want to try
 Good luck getting the expanded math right
Can we expand the math by hand?
JointWorldSpace = Scale*Shear*
ParentScale*ParentShear*
RotZ*RotY*RotX*
((ParentScale*ParentShear).inverse())*
Translate*
ParentWorldSpace;
25
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Must keep high level representation of algorithm
 Perform the absolute minimum required number of
math operations
– It must track known values
– Continue tracking values through matrix multiplications
 Utilize known information to provide a cheaper
alternative to full matrix inversions
 Interface/Adapt to existing 4x4 Matrix data types
Ideal Transform Behavior
26
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
C++ library to enable composition of 3d transforms
Instead of a general purpose 4x4 matrix, it provides
specific types for different transforms.
Track known values through multiplication chains
Deferred Evaluation
Localized source code changes required to take
advantage of
Introducing Xform Building Blocks (XBB)
27
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
XBB Scale, Shear3, & Translation
ref::Matrix4x4 S;
S.makeScale(scaleX, scaleY, scaleZ);
ref::Matrix4x4 SH;
SH.makeShear3(shearX, shearY, shearZ);
ref::Matrix4x4 T;
T.makeTranslation(transX, transY, transZ);
128 Bytes of Stack
Used Per 4x4 Matrix
Overhead to initialize to Identity(),
then overwrite elements
28
xbb::Scale S(scaleX, scaleY, scaleZ);
xbb::Shear3 SH(shearX, shearY, shearZ);
xbb::Translation T(transX, transY, transZ);
 Before  After XBB
24 Bytes of Stack
No overhead to initialize
4x4 elements that are
known to be 0 or 1
for each type of transform
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
XBB Transform Representation
struct Translation
{
double x;
double y;
double z;
…
};
29
 Stores only non-constant data
needed to represent a 4x4 matrix of
the transform type
 Provides methods for element level
access to a 4x4 matrix
– Return known constant values
double e10() const { return 0.0; }
double e11() const { return 1.0; }
double e12() const { return 0.0; }
double e13() const { return 0.0; }
double e20() const { return 0.0; }
double e21() const { return 0.0; }
double e22() const { return 1.0; }
double e23() const { return 0.0; }
double e30() const { return x; }
double e31() const { return y; }
double e32() const { return z; }
double e33() const { return 1.0; }
double e00() const { return 1.0; }
double e01() const { return 0.0; }
double e02() const { return 0.0; }
double e03() const { return 0.0; }
Translation(x,y,z)
[1][0][0][0]
[0][1][0][0]
[0][0][1][0]
[x][y][z][1]
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
XBB Transform Constancy
enum Constancy
{
ConstantZero,
ConstantOne,
NotConstant
};
30
 Each transform identifies if each 4x4
matrix element is a constant 0, 1, or
Not Constant
 Constancy is suitable as template
parameter
– Matrix Multiply will make use of
static const Constancy c10 = ConstantZero;
static const Constancy c11 = ConstantOne;
static const Constancy c12 = ConstantZero;
static const Constancy c13 = ConstantZero;
static const Constancy c20 = ConstantZero;
static const Constancy c21 = ConstantZero;
static const Constancy c22 = ConstantOne;
static const Constancy c23 = ConstantZero;
static const Constancy c30 = NotConstant;
static const Constancy c31 = NotConstant;
static const Constancy c32 = NotConstant;
static const Constancy c33 = ConstantOne;
static const Constancy c00 = ConstantOne;
static const Constancy c01 = ConstantZero;
static const Constancy c02 = ConstantZero;
static const Constancy c03 = ConstantZero;
Translation(x,y,z)
[1][0][0][0]
[0][1][0][0]
[0][0][1][0]
[x][y][z][1]
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
XBB Rotations
ref::Matrix4x4 Rx;
Rx.makeRotationX(rotX);
ref::Matrix4x4 Ry;
Ry.makeRotationY(rotY);
ref::Matrix4x4 Rz;
Rz.makeRotationZ(rotZ);
128 Bytes of Stack
Used Per 4x4 Matrix
Overhead to initialize to Identity(),
then overwrite elements
31
xbb::RotationX Rx(rotX);
xbb::RotationY Ry(rotY);
xbb::RotationZ Rz(rotZ);
 Before  After XBB
16 Bytes of Stack
No overhead to initialize
4x4 elements that are
known to be 0 or 1
for each type of transform
sin(angle)
cosine(angle)
sine(angle)
cosine(angle)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
XBB Rotation Representation
struct RotationX
{
double cosineOfAngle;
double sineOfAngle;
…
};
32
 Stores the sine and cosine of the
angle, not the angle itself.
 Provides methods for element
level access to a 4x4 matrix
– Return known constant values
double e10() const { return 0.0; }
double e11() const { return cosineOfAngle; }
double e12() const { return sineOfAngle; }
double e13() const { return 0.0; }
double e20() const { return 0.0; }
double e21() const { return -sineOfAngle; }
double e22() const { return cosineOfAngle; }
double e23() const { return 0.0; }
double e30() const { return 0.0; }
double e31() const { return 0.0; }
double e32() const { return 0.0; }
double e33() const { return 1.0; }
double e00() const { return 1.0; }
double e01() const { return 0.0; }
double e02() const { return 0.0; }
double e03() const { return 0.0; }
Rotate X axis(angle)
[1][0][0][0]
[0][c][s][0]
[0][-s][c][0]
[0][0][0][1]
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
XBB Multiply
ref::Matrix4x4 SxSH;
SxSH = S*SH;
33
auto SxSH = S*SH;
xbb::Matrix4x3 SxSH_Matrix;
SxSH.to(SxSH_Matrix);
 Before
 After XBB
No Math is performed.
Instead, a new type
Multiply<Scale, Shear3>
is returned
Math is deferred until you explicitly
export to a general purpose matrix.
XBB’s Multiply uses the Constancy
of its template parameters to
define its own Constancy values
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Multiplication Chains
ref::Matrix4x4 jointLocalSpace;
jointLocalSpace = S*SH*Rz*Ry*Rx*T;
34
xbb::Matrix4x3 jointLocalSpace;
(S*SH*Rz*Ry*Rx*T).to(jointLocalSpace);
 Before
 After XBB
Confirmed assembly has
minimum math operations
5 matrix multiplications:
320 multiplications
240 adds
Speedup 2.45x
Multiply<Multiply<Multiply<Multiply<Multiply<Scale, Shear3>,
RotationZ>,
RotationY>,
RotationX>,
Translation>
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Deferred Evaluation (reduce)
35
typedef ReducedMatrix
<
c00, c01, c02, c03,
c10, c11, c12, c13,
c20, c21, c22, c23,
c30, c31, c32, c33
> ReducedType;
 ReducedMatrix based on a transform’s
Constancy.
– Only has data members for NotConstant matrix
elements
 Multiply’s reduce recursively expands its left
and right operands
– Expands out entire multiplication chain
 4x4 elements setByMatrixMultiply
– Actually multiplies a column by row
– Knows Constancy of the elements from reduced
left and right transforms
 Using template specialization based on the
Constancy
– Only exact terms necessary are accessed
– Emits only necessary multiplications & additions
ReducedType Multiply::reduce() const
{
const auto tl = left.reduce();
const auto tr = right.reduce();
ReducedType r;
r.setByMatrixMultiply<0,0>(tl,tr);
r.setByMatrixMultiply<0,1>(tl,tr);
...
r.setByMatrixMultiply<3,3>(tl,tr);
return r;
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Many Hierarchy operations change only Translation of a Joint.
– If we could cache the Rotation transforms, then many expensive
sin/cos calls could be avoided.
– Matrix4x4 is too big (128 bytes) to cache one for each Rotation X, Y,
and Z.
 XBB rotations are only 16 bytes each
– Small enough to cache inside the Joint object
XBB: Cached Rotations
(S*SH*cached.Rz*cached.Ry*cached.Rx*T).to(jointLocalSpace);
Use Cached Sin/Cos of Angles
Speedup 12.71x
36
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Identity is free in any multiplication chain
– Optimized out entirely
– Only 1 byte of stack space (empty struct)
 Transpose is free in any multiplication chain
– Deferred evaluation pulls results out in different order
– No additional math or data movement
XBB Identity & Transpose
Identity id;
(S*SH*id*R*T).to(result);
37
(S*SH*R*T).transpose().(result);
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Inverse is very expensive
– Determinant
– Cofactor
– Transpose
– Division
– scalar matrix multiply
Before: Inverse of (Scale*Shear)
inverseOfSxSH = (S*SH).inverse();
38
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
(S*SH).inverse().to(inverseOfSxSH);
 MAGIC happens
– Inverse becomes part of deferred evaluation!
 Because we have a representation of the multiplication chain
– we can move the inverse inside the multiplication chain and reverse its order
 Inverse of most transform primitives is free
– except Scale which costs 3 divisions
 During deferred evaluation
– the logical 4x4 matrix values are reordered and flip signs where needed to
represent its inverse
(SH.inverse()*S.inverse()).to(inverseOfSxSH);
Speedup 6.43x
39
After XBB: Inverse of (Scale*Shear)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Provide template specializations for adapters to map between DWA
math classes and XBB’s.
– Allows XBB deferred evaluation directly into DWA matrix types
 In many scenarios, the transforms could have been Identity based on
logic inside the Joint.
– To take full advantage of XBB, we needed to know the exact type of transforms
of involved.
 Templatized Hierarchy algorithm making conditional logic controlled
by template parameters. e.g.
– Order of Rotations
– Scale Propagation Mode
 Specialized templates based on parameters to
– Use the correct type of XBB transform
 Identity whenever possible
– Multiply the Rotations in the correct order
XBB Integration to DWA Motion System
40
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Built a jump table with instances of the algorithm for all the
different combinations of options and rotation orders.
– Used enums as indexes into multi-dimensional array of function
pointers to the corresponding algorithm instance to execute.
 Used XBB for decomposing World Space Matrix4x4 into individual
Joint attributes.
 Rewrote expensive “hier_apply_fk_around_pivot” with XBB directly
vs. going through Hierarchy object
– Avoid high overhead of building Hierarchy on on the fly
 Performed non XBB related optimizations
– Reduced dynamic memory allocation by replacing local std::vector<T>
with stack based array when possible
XBB Integration to DWA Motion System
(continued…)
41
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Before
 After
XBB DWA Motion System Results
Overall Speedup 1.2x
42
hier_apply_fk_around_pivot
Speedup 2.8x
Motion System
Speedup 1.6x
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Reducing the Critical Path helped Thread Scaling.
43
XBB DWA Motion System Scaling
Reached goal of 30 fps
on single Avoton cartridge
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Good way to improve the impact of vectorization or
threading is to reduce the amount of work being done
outside those data parallel regions.
– Ideally do less work in the first place.
 Complex optimization problems can be represented in C++
and presented back to the compiler in a form it can excel at
optimizing.
– Expanding math by hand is untenable.
 You can do much more with C++11/14 to encapsulate
problems while retaining the original high level algorithm
– Look for optimization problems that might be representable at a
higher level.
Call to Action
44
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 XBB has exactly the features required to support the DWA
Motion System.
 For general purpose use
– more transformations and math operations might be required. e.g.
 Inverse of general 4x4 matrix
 Single precision version or template based data type
 XBB can be licensed or potentially open sourced upon
request.
– Could be of use to CAD, Animation Tools, and Gaming.
 Contact Alex Wells (alex.m.wells@intel.com)
Future Work
45
C o p y r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .
1 of 46

Recommended

Improving the performance of OpenSubdiv* on Intel Architecture by
Improving the performance of OpenSubdiv* on Intel ArchitectureImproving the performance of OpenSubdiv* on Intel Architecture
Improving the performance of OpenSubdiv* on Intel ArchitectureIntel® Software
735 views36 slides
Embree Ray Tracing Kernels by
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing KernelsIntel® Software
2.2K views64 slides
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay by
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRayIntel® Software
1.4K views83 slides
DreamWork Animation DWA by
DreamWork Animation DWADreamWork Animation DWA
DreamWork Animation DWAIntel® Software
2.9K views68 slides
Intel - Nurcan Coskun - Hadoop World 2010 by
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Cloudera, Inc.
1.7K views25 slides
Driving Industrial InnovationOn the Path to Exascale by
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleIntel IT Center
413 views15 slides

More Related Content

What's hot

Transforming Products into Platforms by
Transforming Products into PlatformsTransforming Products into Platforms
Transforming Products into PlatformsDelyn Simons
1.3K views19 slides
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ... by
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...Ceph Community
130 views19 slides
TwilioCon 2013 API Panel with Capital One, ESPN, Accenture, Mashery by
TwilioCon 2013 API Panel with Capital One, ESPN, Accenture, MasheryTwilioCon 2013 API Panel with Capital One, ESPN, Accenture, Mashery
TwilioCon 2013 API Panel with Capital One, ESPN, Accenture, MasheryDelyn Simons
1.4K views8 slides
Intel: мобильность и трансформация рабочего места by
Intel: мобильность и трансформация рабочего местаIntel: мобильность и трансформация рабочего места
Intel: мобильность и трансформация рабочего местаExpolink
336 views17 slides
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir... by
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...Gael Hofemeier
2.4K views53 slides
LF_DPDK17_The Path to Data Plane Microservices by
LF_DPDK17_The Path to Data Plane MicroservicesLF_DPDK17_The Path to Data Plane Microservices
LF_DPDK17_The Path to Data Plane MicroservicesLF_DPDK
176 views9 slides

What's hot(19)

Transforming Products into Platforms by Delyn Simons
Transforming Products into PlatformsTransforming Products into Platforms
Transforming Products into Platforms
Delyn Simons1.3K views
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ... by Ceph Community
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Community 130 views
TwilioCon 2013 API Panel with Capital One, ESPN, Accenture, Mashery by Delyn Simons
TwilioCon 2013 API Panel with Capital One, ESPN, Accenture, MasheryTwilioCon 2013 API Panel with Capital One, ESPN, Accenture, Mashery
TwilioCon 2013 API Panel with Capital One, ESPN, Accenture, Mashery
Delyn Simons1.4K views
Intel: мобильность и трансформация рабочего места by Expolink
Intel: мобильность и трансформация рабочего местаIntel: мобильность и трансформация рабочего места
Intel: мобильность и трансформация рабочего места
Expolink336 views
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir... by Gael Hofemeier
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
Gael Hofemeier2.4K views
LF_DPDK17_The Path to Data Plane Microservices by LF_DPDK
LF_DPDK17_The Path to Data Plane MicroservicesLF_DPDK17_The Path to Data Plane Microservices
LF_DPDK17_The Path to Data Plane Microservices
LF_DPDK176 views
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int... by LF_DPDK
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...
LF_DPDK239 views
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh by MAKERPRO.cc
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
MAKERPRO.cc734 views
TDC2019 Intel Software Day - Inferencia de IA em edge devices by tdc-globalcode
TDC2019 Intel Software Day - Inferencia de IA em edge devicesTDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devices
tdc-globalcode560 views
D101 ggc techprodspec by IMI CALULU
D101 ggc techprodspecD101 ggc techprodspec
D101 ggc techprodspec
IMI CALULU1.8K views
E5 Intel Xeon Processor E5 Family Making the Business Case by Intel IT Center
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
Intel IT Center772 views
Explore, design and implement threading parallelism with Intel® Advisor XE by Intel IT Center
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
Intel IT Center583 views
EARS: The Easy Approach to Requirements Syntax by TechWell
EARS: The Easy Approach to Requirements SyntaxEARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements Syntax
TechWell2.6K views
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests by LF_DPDK
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance testsLF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK409 views
Launch X-431 Diagun V product introduction by LeslieTsai2
Launch X-431 Diagun V product introductionLaunch X-431 Diagun V product introduction
Launch X-431 Diagun V product introduction
LeslieTsai2115 views
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear... by tdc-globalcode
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
tdc-globalcode1K views
Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at... by Intel IT Center
Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...
Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...
Intel IT Center1.8K views
Embedded Platforms Launch Press Presentation by AMD
Embedded Platforms Launch Press PresentationEmbedded Platforms Launch Press Presentation
Embedded Platforms Launch Press Presentation
AMD523 views

Viewers also liked

Real-Time Game Optimization with Intel® GPA by
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
930 views33 slides
Real-Time Game Optimization with Intel® GPA by
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
2K views20 slides
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution by
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionIntel® Software
2.5K views63 slides
Dreamworks Presentation by
Dreamworks PresentationDreamworks Presentation
Dreamworks PresentationDreamworksng
1.1K views20 slides
DreamWorks Animation by
DreamWorks AnimationDreamWorks Animation
DreamWorks AnimationAshley Coro
1.8K views15 slides
VFX Operations by
VFX OperationsVFX Operations
VFX OperationsJohn Patrick
858 views18 slides

Viewers also liked(16)

Real-Time Game Optimization with Intel® GPA by Intel® Software
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
Intel® Software930 views
Real-Time Game Optimization with Intel® GPA by Intel® Software
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
Intel® Software2K views
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution by Intel® Software
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Intel® Software2.5K views
Dreamworks Presentation by Dreamworksng
Dreamworks PresentationDreamworks Presentation
Dreamworks Presentation
Dreamworksng1.1K views
DreamWorks Animation by Ashley Coro
DreamWorks AnimationDreamWorks Animation
DreamWorks Animation
Ashley Coro1.8K views
Looking at Machine Learning in Games by Intel® Software
Looking at Machine Learning in GamesLooking at Machine Learning in Games
Looking at Machine Learning in Games
Intel® Software7.8K views
DreamWorks Pictures by Sarah Byard
DreamWorks PicturesDreamWorks Pictures
DreamWorks Pictures
Sarah Byard1.8K views
Math by jeh20717
MathMath
Math
jeh20717243 views
Dreamworks Studios Skg by rishabhbhatia
Dreamworks Studios SkgDreamworks Studios Skg
Dreamworks Studios Skg
rishabhbhatia1.1K views
Unity Optimization Tips, Tricks and Tools by Intel® Software
Unity Optimization Tips, Tricks and ToolsUnity Optimization Tips, Tricks and Tools
Unity Optimization Tips, Tricks and Tools
Intel® Software2.6K views
Optimization Deep Dive: Unreal Engine 4 on Intel by Intel® Software
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on Intel
Intel® Software2.9K views

Similar to DreamWorks Animation

Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro... by
Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...
Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...inside-BigData.com
1.7K views33 slides
Intel HPC Update by
Intel HPC UpdateIntel HPC Update
Intel HPC UpdateIBM Danmark
2.7K views39 slides
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013 by
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013Intel Software Brasil
2K views16 slides
VIOPS08: マイクロサーバー アーキテクチャトレンド by
VIOPS08: マイクロサーバー アーキテクチャトレンドVIOPS08: マイクロサーバー アーキテクチャトレンド
VIOPS08: マイクロサーバー アーキテクチャトレンドVIOPS Virtualized Infrastructure Operators group ARCHIVES
730 views10 slides
Using Xeon + FPGA for Accelerating HPC Workloads by
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloadsinside-BigData.com
4.5K views35 slides
Austin Cherian: Big data and HPC technologies - intel by
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelVu Hung Nguyen
1.3K views38 slides

Similar to DreamWorks Animation(20)

Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro... by inside-BigData.com
Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...
Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...
inside-BigData.com1.7K views
Intel HPC Update by IBM Danmark
Intel HPC UpdateIntel HPC Update
Intel HPC Update
IBM Danmark2.7K views
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013 by Intel Software Brasil
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Using Xeon + FPGA for Accelerating HPC Workloads by inside-BigData.com
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
inside-BigData.com4.5K views
Austin Cherian: Big data and HPC technologies - intel by Vu Hung Nguyen
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intel
Vu Hung Nguyen1.3K views
Como criar um mundo autônomo e conectado - Jomar Silva by iMasters
Como criar um mundo autônomo e conectado - Jomar SilvaComo criar um mundo autônomo e conectado - Jomar Silva
Como criar um mundo autônomo e conectado - Jomar Silva
iMasters450 views
8 intel network builders overview by videos
8 intel network builders overview8 intel network builders overview
8 intel network builders overview
videos2.4K views
AI & Computer Vision (OpenVINO) - CPBR12 by Jomar Silva
AI & Computer Vision (OpenVINO) - CPBR12AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12
Jomar Silva403 views
4 dpdk roadmap(1) by videos
4 dpdk roadmap(1)4 dpdk roadmap(1)
4 dpdk roadmap(1)
videos2.1K views
O uso de tecnologias Intel na implantação de sistemas de alto desempenho by Intel Software Brasil
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoO uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
Yocto Project Open Source Build System and Collaboration Initiative by Marcelo Sanz
Yocto Project Open Source Build System and Collaboration InitiativeYocto Project Open Source Build System and Collaboration Initiative
Yocto Project Open Source Build System and Collaboration Initiative
Marcelo Sanz2.8K views
Internet of Things: Lightning Round, Sargent by GovLoop
Internet of Things: Lightning Round, SargentInternet of Things: Lightning Round, Sargent
Internet of Things: Lightning Round, Sargent
GovLoop1.5K views
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2) by IntelAPAC
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
IntelAPAC812 views
Transforming Business with Advanced Analytics by Intel IT Center
Transforming Business with Advanced AnalyticsTransforming Business with Advanced Analytics
Transforming Business with Advanced Analytics
Intel IT Center751 views
Intel® AI: Reinforcement Learning Coach by Intel® Software
Intel® AI:  Reinforcement Learning Coach Intel® AI:  Reinforcement Learning Coach
Intel® AI: Reinforcement Learning Coach
Intel® Software724 views
Intel Mobile Launch Information by Anna Yovka
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch Information
Anna Yovka370 views
50 Billion Connected Things are Coming by Intel® Software
50 Billion Connected Things are Coming50 Billion Connected Things are Coming
50 Billion Connected Things are Coming
Intel® Software778 views
E20190227[EDLS]インテル®︎FPGAによるエッジAI by LeapMind Inc
E20190227[EDLS]インテル®︎FPGAによるエッジAIE20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAI
LeapMind Inc298 views
Achieve Unconstrained Collaboration in a Digital World by Intel IT Center
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
Intel IT Center500 views

More from Intel® Software

AI for All: Biology is eating the world & AI is eating Biology by
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
606 views22 slides
Python Data Science and Machine Learning at Scale with Intel and Anaconda by
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
1.4K views21 slides
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci by
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
2.2K views33 slides
AI for good: Scaling AI in science, healthcare, and more. by
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
4.4K views12 slides
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su... by
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
6.1K views21 slides
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization... by
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
2.7K views22 slides

More from Intel® Software(20)

AI for All: Biology is eating the world & AI is eating Biology by Intel® Software
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
Intel® Software606 views
Python Data Science and Machine Learning at Scale with Intel and Anaconda by Intel® Software
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Intel® Software1.4K views
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci by Intel® Software
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Intel® Software2.2K views
AI for good: Scaling AI in science, healthcare, and more. by Intel® Software
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
Intel® Software4.4K views
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su... by Intel® Software
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Intel® Software6.1K views
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization... by Intel® Software
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Intel® Software2.7K views
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S... by Intel® Software
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Intel® Software1.7K views
AWS & Intel Webinar Series - Accelerating AI Research by Intel® Software
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
Intel® Software7.3K views
Intel AIDC Houston Summit - Overview Slides by Intel® Software
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
Intel® Software1.4K views
AIDC NY: BODO AI Presentation - 09.19.2019 by Intel® Software
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
Intel® Software1.4K views
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019 by Intel® Software
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
Intel® Software408 views
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl... by Intel® Software
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Intel® Software3.1K views
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses... by Intel® Software
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Intel® Software3.4K views
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019... by Intel® Software
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Intel® Software974 views
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect... by Intel® Software
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
Intel® Software1.9K views
AIDC India - Intel Movidius / Open Vino Slides by Intel® Software
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
Intel® Software155 views
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ... by Intel® Software
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software1K views

Recently uploaded

Agile 101 by
Agile 101Agile 101
Agile 101John Valentino
13 views20 slides
aATP - New Correlation Confirmation Feature.pptx by
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptxEsatEsenek1
222 views6 slides
Winter Projects GDSC IITK by
Winter Projects GDSC IITKWinter Projects GDSC IITK
Winter Projects GDSC IITKSahilSingh368445
416 views60 slides
EV Charging App Case by
EV Charging App Case EV Charging App Case
EV Charging App Case iCoderz Solutions
10 views1 slide
Introduction to Maven by
Introduction to MavenIntroduction to Maven
Introduction to MavenJohn Valentino
7 views10 slides
.NET Deserialization Attacks by
.NET Deserialization Attacks.NET Deserialization Attacks
.NET Deserialization AttacksDharmalingam Ganesan
7 views50 slides

Recently uploaded(20)

aATP - New Correlation Confirmation Feature.pptx by EsatEsenek1
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptx
EsatEsenek1222 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67026 views
predicting-m3-devopsconMunich-2023-v2.pptx by Tier1 app
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptx
Tier1 app14 views
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers44 views
ADDO_2022_CICID_Tom_Halpin.pdf by TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin96 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert35 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 6 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan8 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254559 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino10 views
Streamlining Your Business Operations with Enterprise Application Integration... by Flexsin
Streamlining Your Business Operations with Enterprise Application Integration...Streamlining Your Business Operations with Enterprise Application Integration...
Streamlining Your Business Operations with Enterprise Application Integration...
Flexsin 5 views

DreamWorks Animation

  • 1. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. DreamWorks Animation*: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks
  • 2. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. DreamWorks Animation*: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks
  • 3. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. DreamWorks Animation: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks
  • 4. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Alex Wells (presenter) & Martin Watt (DWA) August 12 & 13, 2015 DreamWorks Animation: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks
  • 5. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported. SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjbb, SPECjvm, SPECWeb, SPECompM, SPECompL, SPEC MPI, SPECjEnterprise* are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information. TPC-C, TPC-H, TPC-E are trademarks of the Transaction Processing Council. See http://www.tpc.org for more information. Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information including details on which processors support HT Technology, see here Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost No computer system can provide absolute security. Requires an enabled Intel® processor and software optimized for use of the technology. Consult your system manufacturer and/or software vendor for more information. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: Learn About Intel® Processor Numbers Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and products specified are for planning purposes only and are subject to change without notice *Other names and brands may be claimed as the property of others. Legal Disclaimers 5
  • 6. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release. Risk Factors 6
  • 7. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 7
  • 8. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Before  After Overall Speedup 1.2x 8 DWA* Character Animation Speedup After XBB Motion System Speedup 1.6x
  • 9. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Motion System in DWA Character Animation  Observed performance bottlenecks in Motion System  3d Matrix transforms  How would an ideal transform behave  XBB representation  XBB deferred evaluation  Results Agenda 9
  • 10. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  To represent bones of a skeleton in 3d space an animation tool builds a Hierarchy of Joints and how they are connected. – Typically a Directed Acyclic Graph of Joints How is a skeleton represented for animation? 10
  • 11. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Relative to a parent Joint (in Local Space), each Joint needs to model: – Rotational Euler Angles(around X, Y, and Z axis) & Order – Scale (of X, Y, and Z axis) – Shear (along X, Y, and Z axis) – Translation (X, Y, and Z components)  Animation curves change values over time – drive the Joint’s attributes (rotation, translation, etc.) How is a each Joint represented? 11
  • 12. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Deformers which compute the final 3d vertices of a character’s skin need an “Frame” of reference to apply offsets from.  The “World Space” Position and Orientation of the Joints from the Hierarchy (skeleton) provide that “Frame” of reference. How does the skeleton influence the skin? 12
  • 13. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Representing a “Frame” of reference struct Matrix4x4 { double m[4][4]; };  A 4x4 Matrix can represent the Position and Orientation of a Joint in World Space.  When used in this manner, the 4x4 Matrix is commonly referred to as a 3d transform (x-form).  4x4 Matrix is typically implemented literally as a 4x4 array of floating point values. 13
  • 14. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Rotation, Scale, Shear, and Translation can all be represented as 4x4 Matrices.  Multiple 4x4 Matrices can be concatenated (multiplied) together to a single 4x4 matrix.  3d points and 3d vectors (offsets) can be multiplied through a 4x4 Matrix to be transformed to the position and orientation in “World Space” it represents.  For each Joint – matrices representing Scale, Shear, Rotation, and Translation are combined together into a single “Local Space” 4x4 matrix. Why a 4x4 Matrix? 14
  • 15. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  By recursively combining the “Local Space” transforms of a Joint with its parent Joint’s “Local Space” until the root of the hierarchy is reached, a 4x4 matrix can be accumulated that represents the World Space of that Joint.  As there are many joints, its pays off to cache a “World Space” 4x4 Matrix at each joint, so that a recursive walk up the hierarchy can stop early if a clean “World Space” has been cached. How To Calculate The World Space Transform Of A Joint? 15
  • 16. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Each time step, 1000’s of Joint attributes change, invalidating a Hierarchy’s cached World Space and Local Space transforms.  1000’s of operations on Hierarchy objects build up a complex skeleton. Hierarchy is the core of DWA’s Motion System  Imagine how many bones are used to represent a 4 legged creature with a tail & wings.  Due to the recursion, there is little opportunity for data vectorization or threading. 16
  • 17. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Despite heavy parallelization of the Deformation System (green & yellow), it can’t start until the Motion System (red) finishes assembling a Hierarchy. Motion System Is On The Critical Path 17
  • 18. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Motion System dwarfs the other systems.  Amdahl’s law limits our threading & vectorization improvements in the deformation system from having a larger overall impact. Wall Time Spent in Each Category 18
  • 19. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  “hier_apply_fk_around_pivot” as the hottest operator – Operates on a Hierarchy – Verified in Intel® VTune™ Amplifier XE  Several other “hier” related operations taking up other top hot spots. Time Spent inside each type of Operator 19
  • 20. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Typical implementation – Loop over rows – Loop over colums – Compute result element by multiplying one row of first matrix across one column of the other  Simple enough, but how much work did we really just do? struct Matrix4x4 { double m[4][4]; }; 20 Matrix4x4 operator * (const Matrix4x4 &iOther) { Matrix4x4 result; for (int r=0;r < 4; ++r) { for (int c=0;c < 4; ++c) { double sum = 0.0; for(int k=0; k < 4; ++k) { sum += m[r][k]*iOther.m[k][c]; } result.m[r][c] = sum; } } return result; } Matrix Concatenation (Multiplication)
  • 21. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  64 Multiplies (double precision)  48 Additions (double precision) Expensive Matrix Concatenation Matrix4x4 operator * (const Matrix4x4 &iOther) { Matrix4x4 result; result.m[0][0] = m[0][0]*iOther.m[0][0] + m[0][1]*iOther.m[1][0] + m[0][2]*iOther.m[2][0] + m[0][3]*iOther.m[3][0]; result.m[0][1] = m[0][0]*iOther.m[0][1] + m[0][1]*iOther.m[1][1] + m[0][2]*iOther.m[2][1] + m[0][3]*iOther.m[3][1]; result.m[0][2] = m[0][0]*iOther.m[0][2] + m[0][1]*iOther.m[1][2] + m[0][2]*iOther.m[2][2] + m[0][3]*iOther.m[3][2]; result.m[0][3] = m[0][0]*iOther.m[0][3] + m[0][1]*iOther.m[1][3] + m[0][2]*iOther.m[2][3] + m[0][3]*iOther.m[3][3]; result.m[1][0] = m[1][0]*iOther.m[0][0] + m[1][1]*iOther.m[1][0] + m[1][2]*iOther.m[2][0] + m[1][3]*iOther.m[3][0]; result.m[1][1] = m[1][0]*iOther.m[0][1] + m[1][1]*iOther.m[1][1] + m[1][2]*iOther.m[2][1] + m[1][3]*iOther.m[3][1]; result.m[1][2] = m[1][0]*iOther.m[0][2] + m[1][1]*iOther.m[1][2] + m[1][2]*iOther.m[2][2] + m[1][3]*iOther.m[3][2]; result.m[1][3] = m[1][0]*iOther.m[0][3] + m[1][1]*iOther.m[1][3] + m[1][2]*iOther.m[2][3] + m[1][3]*iOther.m[3][3]; result.m[2][0] = m[2][0]*iOther.m[0][0] + m[2][1]*iOther.m[1][0] + m[2][2]*iOther.m[2][0] + m[2][3]*iOther.m[3][0]; result.m[2][1] = m[2][0]*iOther.m[0][1] + m[2][1]*iOther.m[1][1] + m[2][2]*iOther.m[2][1] + m[2][3]*iOther.m[3][1]; result.m[2][2] = m[2][0]*iOther.m[0][2] + m[2][1]*iOther.m[1][2] + m[2][2]*iOther.m[2][2] + m[2][3]*iOther.m[3][2]; result.m[2][3] = m[2][0]*iOther.m[0][3] + m[2][1]*iOther.m[1][3] + m[2][2]*iOther.m[2][3] + m[2][3]*iOther.m[3][3]; result.m[3][0] = m[3][0]*iOther.m[0][0] + m[3][1]*iOther.m[1][0] + m[3][2]*iOther.m[2][0] + m[3][3]*iOther.m[3][0]; result.m[3][1] = m[3][0]*iOther.m[0][1] + m[3][1]*iOther.m[1][1] + m[3][2]*iOther.m[2][1] + m[3][3]*iOther.m[3][1]; result.m[3][2] = m[3][0]*iOther.m[0][2] + m[3][1]*iOther.m[1][2] + m[3][2]*iOther.m[2][2] + m[3][3]*iOther.m[3][2]; result.m[3][3] = m[3][0]*iOther.m[0][3] + m[3][1]*iOther.m[1][3] + m[3][2]*iOther.m[2][3] + m[3][3]*iOther.m[3][3]; return result; } 21
  • 22. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Good news! YES!  If you knew the exact transform a 4x4 matrix was representing, you would know quite a few 0 and 1 values at compile time. Are Any of Those 16 Matrix Values Known At Compile Time? Identity [1][0][0][0] [0][1][0][0] [0][0][1][0] [0][0][0][1] Translation(x,y,z) [1][0][0][0] [0][1][0][0] [0][0][1][0] [x][y][z][1] Shear(x,y,z) [1][0][0][0] [x][1][0][0] [y][z][1][0] [0][0][0][1] Scale(x,y,z) [x][0][0][0] [0][y][0][0] [0][0][z][0] [0][0][0][1] 22
  • 23. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Building rotation matrices is more expensive because of the need to call sine and cosine on the angle  Rotations also have 0 and 1 values What About Rotations? Rotate X axis(angle) [1][0][0][0] [0][c][s][0] [0][-s][c][0] [0][0][0][1] Rotate Y axis(angle) [c][0][-s][0] [0][1][0][0] [s][0][c][0] [0][0][0][1] Rotate Z axis(angle) [c][s][0][0] [-s][c][0][0] [0][0][1][0] [0][0][0][1] 23 let s = sine(angle) let c = cosine(angle)
  • 24. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Unfortunately, the matrix multiply method doesn’t know that the 4x4 Matrix it was passed has any 0 or 1 values – So it can not avoid performing math operations.  Even if we had separate classes to represent the different transformations and multiple versions of the matrix multiply method for each – The result becomes a general 4x4 matrix. – Chains of multiplication would only benefit on the 1st multiply operation Huge Optimization Potential! 24
  • 25. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Pseudo algorithm to compute a Joint’s World Space – 10 4x4 matrix multiplications – 1 matrix inversion (very expensive) in the middle  YES… But you won’t even want to try  Good luck getting the expanded math right Can we expand the math by hand? JointWorldSpace = Scale*Shear* ParentScale*ParentShear* RotZ*RotY*RotX* ((ParentScale*ParentShear).inverse())* Translate* ParentWorldSpace; 25
  • 26. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Must keep high level representation of algorithm  Perform the absolute minimum required number of math operations – It must track known values – Continue tracking values through matrix multiplications  Utilize known information to provide a cheaper alternative to full matrix inversions  Interface/Adapt to existing 4x4 Matrix data types Ideal Transform Behavior 26
  • 27. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. C++ library to enable composition of 3d transforms Instead of a general purpose 4x4 matrix, it provides specific types for different transforms. Track known values through multiplication chains Deferred Evaluation Localized source code changes required to take advantage of Introducing Xform Building Blocks (XBB) 27
  • 28. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Scale, Shear3, & Translation ref::Matrix4x4 S; S.makeScale(scaleX, scaleY, scaleZ); ref::Matrix4x4 SH; SH.makeShear3(shearX, shearY, shearZ); ref::Matrix4x4 T; T.makeTranslation(transX, transY, transZ); 128 Bytes of Stack Used Per 4x4 Matrix Overhead to initialize to Identity(), then overwrite elements 28 xbb::Scale S(scaleX, scaleY, scaleZ); xbb::Shear3 SH(shearX, shearY, shearZ); xbb::Translation T(transX, transY, transZ);  Before  After XBB 24 Bytes of Stack No overhead to initialize 4x4 elements that are known to be 0 or 1 for each type of transform
  • 29. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Transform Representation struct Translation { double x; double y; double z; … }; 29  Stores only non-constant data needed to represent a 4x4 matrix of the transform type  Provides methods for element level access to a 4x4 matrix – Return known constant values double e10() const { return 0.0; } double e11() const { return 1.0; } double e12() const { return 0.0; } double e13() const { return 0.0; } double e20() const { return 0.0; } double e21() const { return 0.0; } double e22() const { return 1.0; } double e23() const { return 0.0; } double e30() const { return x; } double e31() const { return y; } double e32() const { return z; } double e33() const { return 1.0; } double e00() const { return 1.0; } double e01() const { return 0.0; } double e02() const { return 0.0; } double e03() const { return 0.0; } Translation(x,y,z) [1][0][0][0] [0][1][0][0] [0][0][1][0] [x][y][z][1]
  • 30. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Transform Constancy enum Constancy { ConstantZero, ConstantOne, NotConstant }; 30  Each transform identifies if each 4x4 matrix element is a constant 0, 1, or Not Constant  Constancy is suitable as template parameter – Matrix Multiply will make use of static const Constancy c10 = ConstantZero; static const Constancy c11 = ConstantOne; static const Constancy c12 = ConstantZero; static const Constancy c13 = ConstantZero; static const Constancy c20 = ConstantZero; static const Constancy c21 = ConstantZero; static const Constancy c22 = ConstantOne; static const Constancy c23 = ConstantZero; static const Constancy c30 = NotConstant; static const Constancy c31 = NotConstant; static const Constancy c32 = NotConstant; static const Constancy c33 = ConstantOne; static const Constancy c00 = ConstantOne; static const Constancy c01 = ConstantZero; static const Constancy c02 = ConstantZero; static const Constancy c03 = ConstantZero; Translation(x,y,z) [1][0][0][0] [0][1][0][0] [0][0][1][0] [x][y][z][1]
  • 31. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Rotations ref::Matrix4x4 Rx; Rx.makeRotationX(rotX); ref::Matrix4x4 Ry; Ry.makeRotationY(rotY); ref::Matrix4x4 Rz; Rz.makeRotationZ(rotZ); 128 Bytes of Stack Used Per 4x4 Matrix Overhead to initialize to Identity(), then overwrite elements 31 xbb::RotationX Rx(rotX); xbb::RotationY Ry(rotY); xbb::RotationZ Rz(rotZ);  Before  After XBB 16 Bytes of Stack No overhead to initialize 4x4 elements that are known to be 0 or 1 for each type of transform sin(angle) cosine(angle) sine(angle) cosine(angle)
  • 32. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Rotation Representation struct RotationX { double cosineOfAngle; double sineOfAngle; … }; 32  Stores the sine and cosine of the angle, not the angle itself.  Provides methods for element level access to a 4x4 matrix – Return known constant values double e10() const { return 0.0; } double e11() const { return cosineOfAngle; } double e12() const { return sineOfAngle; } double e13() const { return 0.0; } double e20() const { return 0.0; } double e21() const { return -sineOfAngle; } double e22() const { return cosineOfAngle; } double e23() const { return 0.0; } double e30() const { return 0.0; } double e31() const { return 0.0; } double e32() const { return 0.0; } double e33() const { return 1.0; } double e00() const { return 1.0; } double e01() const { return 0.0; } double e02() const { return 0.0; } double e03() const { return 0.0; } Rotate X axis(angle) [1][0][0][0] [0][c][s][0] [0][-s][c][0] [0][0][0][1]
  • 33. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Multiply ref::Matrix4x4 SxSH; SxSH = S*SH; 33 auto SxSH = S*SH; xbb::Matrix4x3 SxSH_Matrix; SxSH.to(SxSH_Matrix);  Before  After XBB No Math is performed. Instead, a new type Multiply<Scale, Shear3> is returned Math is deferred until you explicitly export to a general purpose matrix. XBB’s Multiply uses the Constancy of its template parameters to define its own Constancy values
  • 34. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Multiplication Chains ref::Matrix4x4 jointLocalSpace; jointLocalSpace = S*SH*Rz*Ry*Rx*T; 34 xbb::Matrix4x3 jointLocalSpace; (S*SH*Rz*Ry*Rx*T).to(jointLocalSpace);  Before  After XBB Confirmed assembly has minimum math operations 5 matrix multiplications: 320 multiplications 240 adds Speedup 2.45x Multiply<Multiply<Multiply<Multiply<Multiply<Scale, Shear3>, RotationZ>, RotationY>, RotationX>, Translation>
  • 35. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Deferred Evaluation (reduce) 35 typedef ReducedMatrix < c00, c01, c02, c03, c10, c11, c12, c13, c20, c21, c22, c23, c30, c31, c32, c33 > ReducedType;  ReducedMatrix based on a transform’s Constancy. – Only has data members for NotConstant matrix elements  Multiply’s reduce recursively expands its left and right operands – Expands out entire multiplication chain  4x4 elements setByMatrixMultiply – Actually multiplies a column by row – Knows Constancy of the elements from reduced left and right transforms  Using template specialization based on the Constancy – Only exact terms necessary are accessed – Emits only necessary multiplications & additions ReducedType Multiply::reduce() const { const auto tl = left.reduce(); const auto tr = right.reduce(); ReducedType r; r.setByMatrixMultiply<0,0>(tl,tr); r.setByMatrixMultiply<0,1>(tl,tr); ... r.setByMatrixMultiply<3,3>(tl,tr); return r; }
  • 36. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Many Hierarchy operations change only Translation of a Joint. – If we could cache the Rotation transforms, then many expensive sin/cos calls could be avoided. – Matrix4x4 is too big (128 bytes) to cache one for each Rotation X, Y, and Z.  XBB rotations are only 16 bytes each – Small enough to cache inside the Joint object XBB: Cached Rotations (S*SH*cached.Rz*cached.Ry*cached.Rx*T).to(jointLocalSpace); Use Cached Sin/Cos of Angles Speedup 12.71x 36
  • 37. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Identity is free in any multiplication chain – Optimized out entirely – Only 1 byte of stack space (empty struct)  Transpose is free in any multiplication chain – Deferred evaluation pulls results out in different order – No additional math or data movement XBB Identity & Transpose Identity id; (S*SH*id*R*T).to(result); 37 (S*SH*R*T).transpose().(result);
  • 38. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Inverse is very expensive – Determinant – Cofactor – Transpose – Division – scalar matrix multiply Before: Inverse of (Scale*Shear) inverseOfSxSH = (S*SH).inverse(); 38
  • 39. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. (S*SH).inverse().to(inverseOfSxSH);  MAGIC happens – Inverse becomes part of deferred evaluation!  Because we have a representation of the multiplication chain – we can move the inverse inside the multiplication chain and reverse its order  Inverse of most transform primitives is free – except Scale which costs 3 divisions  During deferred evaluation – the logical 4x4 matrix values are reordered and flip signs where needed to represent its inverse (SH.inverse()*S.inverse()).to(inverseOfSxSH); Speedup 6.43x 39 After XBB: Inverse of (Scale*Shear)
  • 40. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Provide template specializations for adapters to map between DWA math classes and XBB’s. – Allows XBB deferred evaluation directly into DWA matrix types  In many scenarios, the transforms could have been Identity based on logic inside the Joint. – To take full advantage of XBB, we needed to know the exact type of transforms of involved.  Templatized Hierarchy algorithm making conditional logic controlled by template parameters. e.g. – Order of Rotations – Scale Propagation Mode  Specialized templates based on parameters to – Use the correct type of XBB transform  Identity whenever possible – Multiply the Rotations in the correct order XBB Integration to DWA Motion System 40
  • 41. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Built a jump table with instances of the algorithm for all the different combinations of options and rotation orders. – Used enums as indexes into multi-dimensional array of function pointers to the corresponding algorithm instance to execute.  Used XBB for decomposing World Space Matrix4x4 into individual Joint attributes.  Rewrote expensive “hier_apply_fk_around_pivot” with XBB directly vs. going through Hierarchy object – Avoid high overhead of building Hierarchy on on the fly  Performed non XBB related optimizations – Reduced dynamic memory allocation by replacing local std::vector<T> with stack based array when possible XBB Integration to DWA Motion System (continued…) 41
  • 42. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Before  After XBB DWA Motion System Results Overall Speedup 1.2x 42 hier_apply_fk_around_pivot Speedup 2.8x Motion System Speedup 1.6x
  • 43. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Reducing the Critical Path helped Thread Scaling. 43 XBB DWA Motion System Scaling Reached goal of 30 fps on single Avoton cartridge
  • 44. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Good way to improve the impact of vectorization or threading is to reduce the amount of work being done outside those data parallel regions. – Ideally do less work in the first place.  Complex optimization problems can be represented in C++ and presented back to the compiler in a form it can excel at optimizing. – Expanding math by hand is untenable.  You can do much more with C++11/14 to encapsulate problems while retaining the original high level algorithm – Look for optimization problems that might be representable at a higher level. Call to Action 44
  • 45. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  XBB has exactly the features required to support the DWA Motion System.  For general purpose use – more transformations and math operations might be required. e.g.  Inverse of general 4x4 matrix  Single precision version or template based data type  XBB can be licensed or potentially open sourced upon request. – Could be of use to CAD, Animation Tools, and Gaming.  Contact Alex Wells (alex.m.wells@intel.com) Future Work 45
  • 46. C o p y r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .