Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DreamWorks Animation

3,998 views

Published on

DreamWorks Animation*: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks

Published in: Software
  • Be the first to comment

DreamWorks Animation

  1. 1. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. DreamWorks Animation*: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks
  2. 2. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. DreamWorks Animation*: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks
  3. 3. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. DreamWorks Animation: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks
  4. 4. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Alex Wells (presenter) & Martin Watt (DWA) August 12 & 13, 2015 DreamWorks Animation: Slashing the cost of 3d Matrix Math using X-Form (Transform) Building Blocks
  5. 5. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported. SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjbb, SPECjvm, SPECWeb, SPECompM, SPECompL, SPEC MPI, SPECjEnterprise* are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information. TPC-C, TPC-H, TPC-E are trademarks of the Transaction Processing Council. See http://www.tpc.org for more information. Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information including details on which processors support HT Technology, see here Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost No computer system can provide absolute security. Requires an enabled Intel® processor and software optimized for use of the technology. Consult your system manufacturer and/or software vendor for more information. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: Learn About Intel® Processor Numbers Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and products specified are for planning purposes only and are subject to change without notice *Other names and brands may be claimed as the property of others. Legal Disclaimers 5
  6. 6. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release. Risk Factors 6
  7. 7. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 7
  8. 8. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Before  After Overall Speedup 1.2x 8 DWA* Character Animation Speedup After XBB Motion System Speedup 1.6x
  9. 9. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Motion System in DWA Character Animation  Observed performance bottlenecks in Motion System  3d Matrix transforms  How would an ideal transform behave  XBB representation  XBB deferred evaluation  Results Agenda 9
  10. 10. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  To represent bones of a skeleton in 3d space an animation tool builds a Hierarchy of Joints and how they are connected. – Typically a Directed Acyclic Graph of Joints How is a skeleton represented for animation? 10
  11. 11. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Relative to a parent Joint (in Local Space), each Joint needs to model: – Rotational Euler Angles(around X, Y, and Z axis) & Order – Scale (of X, Y, and Z axis) – Shear (along X, Y, and Z axis) – Translation (X, Y, and Z components)  Animation curves change values over time – drive the Joint’s attributes (rotation, translation, etc.) How is a each Joint represented? 11
  12. 12. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Deformers which compute the final 3d vertices of a character’s skin need an “Frame” of reference to apply offsets from.  The “World Space” Position and Orientation of the Joints from the Hierarchy (skeleton) provide that “Frame” of reference. How does the skeleton influence the skin? 12
  13. 13. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Representing a “Frame” of reference struct Matrix4x4 { double m[4][4]; };  A 4x4 Matrix can represent the Position and Orientation of a Joint in World Space.  When used in this manner, the 4x4 Matrix is commonly referred to as a 3d transform (x-form).  4x4 Matrix is typically implemented literally as a 4x4 array of floating point values. 13
  14. 14. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Rotation, Scale, Shear, and Translation can all be represented as 4x4 Matrices.  Multiple 4x4 Matrices can be concatenated (multiplied) together to a single 4x4 matrix.  3d points and 3d vectors (offsets) can be multiplied through a 4x4 Matrix to be transformed to the position and orientation in “World Space” it represents.  For each Joint – matrices representing Scale, Shear, Rotation, and Translation are combined together into a single “Local Space” 4x4 matrix. Why a 4x4 Matrix? 14
  15. 15. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  By recursively combining the “Local Space” transforms of a Joint with its parent Joint’s “Local Space” until the root of the hierarchy is reached, a 4x4 matrix can be accumulated that represents the World Space of that Joint.  As there are many joints, its pays off to cache a “World Space” 4x4 Matrix at each joint, so that a recursive walk up the hierarchy can stop early if a clean “World Space” has been cached. How To Calculate The World Space Transform Of A Joint? 15
  16. 16. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Each time step, 1000’s of Joint attributes change, invalidating a Hierarchy’s cached World Space and Local Space transforms.  1000’s of operations on Hierarchy objects build up a complex skeleton. Hierarchy is the core of DWA’s Motion System  Imagine how many bones are used to represent a 4 legged creature with a tail & wings.  Due to the recursion, there is little opportunity for data vectorization or threading. 16
  17. 17. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Despite heavy parallelization of the Deformation System (green & yellow), it can’t start until the Motion System (red) finishes assembling a Hierarchy. Motion System Is On The Critical Path 17
  18. 18. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Motion System dwarfs the other systems.  Amdahl’s law limits our threading & vectorization improvements in the deformation system from having a larger overall impact. Wall Time Spent in Each Category 18
  19. 19. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  “hier_apply_fk_around_pivot” as the hottest operator – Operates on a Hierarchy – Verified in Intel® VTune™ Amplifier XE  Several other “hier” related operations taking up other top hot spots. Time Spent inside each type of Operator 19
  20. 20. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Typical implementation – Loop over rows – Loop over colums – Compute result element by multiplying one row of first matrix across one column of the other  Simple enough, but how much work did we really just do? struct Matrix4x4 { double m[4][4]; }; 20 Matrix4x4 operator * (const Matrix4x4 &iOther) { Matrix4x4 result; for (int r=0;r < 4; ++r) { for (int c=0;c < 4; ++c) { double sum = 0.0; for(int k=0; k < 4; ++k) { sum += m[r][k]*iOther.m[k][c]; } result.m[r][c] = sum; } } return result; } Matrix Concatenation (Multiplication)
  21. 21. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  64 Multiplies (double precision)  48 Additions (double precision) Expensive Matrix Concatenation Matrix4x4 operator * (const Matrix4x4 &iOther) { Matrix4x4 result; result.m[0][0] = m[0][0]*iOther.m[0][0] + m[0][1]*iOther.m[1][0] + m[0][2]*iOther.m[2][0] + m[0][3]*iOther.m[3][0]; result.m[0][1] = m[0][0]*iOther.m[0][1] + m[0][1]*iOther.m[1][1] + m[0][2]*iOther.m[2][1] + m[0][3]*iOther.m[3][1]; result.m[0][2] = m[0][0]*iOther.m[0][2] + m[0][1]*iOther.m[1][2] + m[0][2]*iOther.m[2][2] + m[0][3]*iOther.m[3][2]; result.m[0][3] = m[0][0]*iOther.m[0][3] + m[0][1]*iOther.m[1][3] + m[0][2]*iOther.m[2][3] + m[0][3]*iOther.m[3][3]; result.m[1][0] = m[1][0]*iOther.m[0][0] + m[1][1]*iOther.m[1][0] + m[1][2]*iOther.m[2][0] + m[1][3]*iOther.m[3][0]; result.m[1][1] = m[1][0]*iOther.m[0][1] + m[1][1]*iOther.m[1][1] + m[1][2]*iOther.m[2][1] + m[1][3]*iOther.m[3][1]; result.m[1][2] = m[1][0]*iOther.m[0][2] + m[1][1]*iOther.m[1][2] + m[1][2]*iOther.m[2][2] + m[1][3]*iOther.m[3][2]; result.m[1][3] = m[1][0]*iOther.m[0][3] + m[1][1]*iOther.m[1][3] + m[1][2]*iOther.m[2][3] + m[1][3]*iOther.m[3][3]; result.m[2][0] = m[2][0]*iOther.m[0][0] + m[2][1]*iOther.m[1][0] + m[2][2]*iOther.m[2][0] + m[2][3]*iOther.m[3][0]; result.m[2][1] = m[2][0]*iOther.m[0][1] + m[2][1]*iOther.m[1][1] + m[2][2]*iOther.m[2][1] + m[2][3]*iOther.m[3][1]; result.m[2][2] = m[2][0]*iOther.m[0][2] + m[2][1]*iOther.m[1][2] + m[2][2]*iOther.m[2][2] + m[2][3]*iOther.m[3][2]; result.m[2][3] = m[2][0]*iOther.m[0][3] + m[2][1]*iOther.m[1][3] + m[2][2]*iOther.m[2][3] + m[2][3]*iOther.m[3][3]; result.m[3][0] = m[3][0]*iOther.m[0][0] + m[3][1]*iOther.m[1][0] + m[3][2]*iOther.m[2][0] + m[3][3]*iOther.m[3][0]; result.m[3][1] = m[3][0]*iOther.m[0][1] + m[3][1]*iOther.m[1][1] + m[3][2]*iOther.m[2][1] + m[3][3]*iOther.m[3][1]; result.m[3][2] = m[3][0]*iOther.m[0][2] + m[3][1]*iOther.m[1][2] + m[3][2]*iOther.m[2][2] + m[3][3]*iOther.m[3][2]; result.m[3][3] = m[3][0]*iOther.m[0][3] + m[3][1]*iOther.m[1][3] + m[3][2]*iOther.m[2][3] + m[3][3]*iOther.m[3][3]; return result; } 21
  22. 22. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Good news! YES!  If you knew the exact transform a 4x4 matrix was representing, you would know quite a few 0 and 1 values at compile time. Are Any of Those 16 Matrix Values Known At Compile Time? Identity [1][0][0][0] [0][1][0][0] [0][0][1][0] [0][0][0][1] Translation(x,y,z) [1][0][0][0] [0][1][0][0] [0][0][1][0] [x][y][z][1] Shear(x,y,z) [1][0][0][0] [x][1][0][0] [y][z][1][0] [0][0][0][1] Scale(x,y,z) [x][0][0][0] [0][y][0][0] [0][0][z][0] [0][0][0][1] 22
  23. 23. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Building rotation matrices is more expensive because of the need to call sine and cosine on the angle  Rotations also have 0 and 1 values What About Rotations? Rotate X axis(angle) [1][0][0][0] [0][c][s][0] [0][-s][c][0] [0][0][0][1] Rotate Y axis(angle) [c][0][-s][0] [0][1][0][0] [s][0][c][0] [0][0][0][1] Rotate Z axis(angle) [c][s][0][0] [-s][c][0][0] [0][0][1][0] [0][0][0][1] 23 let s = sine(angle) let c = cosine(angle)
  24. 24. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Unfortunately, the matrix multiply method doesn’t know that the 4x4 Matrix it was passed has any 0 or 1 values – So it can not avoid performing math operations.  Even if we had separate classes to represent the different transformations and multiple versions of the matrix multiply method for each – The result becomes a general 4x4 matrix. – Chains of multiplication would only benefit on the 1st multiply operation Huge Optimization Potential! 24
  25. 25. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Pseudo algorithm to compute a Joint’s World Space – 10 4x4 matrix multiplications – 1 matrix inversion (very expensive) in the middle  YES… But you won’t even want to try  Good luck getting the expanded math right Can we expand the math by hand? JointWorldSpace = Scale*Shear* ParentScale*ParentShear* RotZ*RotY*RotX* ((ParentScale*ParentShear).inverse())* Translate* ParentWorldSpace; 25
  26. 26. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Must keep high level representation of algorithm  Perform the absolute minimum required number of math operations – It must track known values – Continue tracking values through matrix multiplications  Utilize known information to provide a cheaper alternative to full matrix inversions  Interface/Adapt to existing 4x4 Matrix data types Ideal Transform Behavior 26
  27. 27. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. C++ library to enable composition of 3d transforms Instead of a general purpose 4x4 matrix, it provides specific types for different transforms. Track known values through multiplication chains Deferred Evaluation Localized source code changes required to take advantage of Introducing Xform Building Blocks (XBB) 27
  28. 28. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Scale, Shear3, & Translation ref::Matrix4x4 S; S.makeScale(scaleX, scaleY, scaleZ); ref::Matrix4x4 SH; SH.makeShear3(shearX, shearY, shearZ); ref::Matrix4x4 T; T.makeTranslation(transX, transY, transZ); 128 Bytes of Stack Used Per 4x4 Matrix Overhead to initialize to Identity(), then overwrite elements 28 xbb::Scale S(scaleX, scaleY, scaleZ); xbb::Shear3 SH(shearX, shearY, shearZ); xbb::Translation T(transX, transY, transZ);  Before  After XBB 24 Bytes of Stack No overhead to initialize 4x4 elements that are known to be 0 or 1 for each type of transform
  29. 29. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Transform Representation struct Translation { double x; double y; double z; … }; 29  Stores only non-constant data needed to represent a 4x4 matrix of the transform type  Provides methods for element level access to a 4x4 matrix – Return known constant values double e10() const { return 0.0; } double e11() const { return 1.0; } double e12() const { return 0.0; } double e13() const { return 0.0; } double e20() const { return 0.0; } double e21() const { return 0.0; } double e22() const { return 1.0; } double e23() const { return 0.0; } double e30() const { return x; } double e31() const { return y; } double e32() const { return z; } double e33() const { return 1.0; } double e00() const { return 1.0; } double e01() const { return 0.0; } double e02() const { return 0.0; } double e03() const { return 0.0; } Translation(x,y,z) [1][0][0][0] [0][1][0][0] [0][0][1][0] [x][y][z][1]
  30. 30. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Transform Constancy enum Constancy { ConstantZero, ConstantOne, NotConstant }; 30  Each transform identifies if each 4x4 matrix element is a constant 0, 1, or Not Constant  Constancy is suitable as template parameter – Matrix Multiply will make use of static const Constancy c10 = ConstantZero; static const Constancy c11 = ConstantOne; static const Constancy c12 = ConstantZero; static const Constancy c13 = ConstantZero; static const Constancy c20 = ConstantZero; static const Constancy c21 = ConstantZero; static const Constancy c22 = ConstantOne; static const Constancy c23 = ConstantZero; static const Constancy c30 = NotConstant; static const Constancy c31 = NotConstant; static const Constancy c32 = NotConstant; static const Constancy c33 = ConstantOne; static const Constancy c00 = ConstantOne; static const Constancy c01 = ConstantZero; static const Constancy c02 = ConstantZero; static const Constancy c03 = ConstantZero; Translation(x,y,z) [1][0][0][0] [0][1][0][0] [0][0][1][0] [x][y][z][1]
  31. 31. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Rotations ref::Matrix4x4 Rx; Rx.makeRotationX(rotX); ref::Matrix4x4 Ry; Ry.makeRotationY(rotY); ref::Matrix4x4 Rz; Rz.makeRotationZ(rotZ); 128 Bytes of Stack Used Per 4x4 Matrix Overhead to initialize to Identity(), then overwrite elements 31 xbb::RotationX Rx(rotX); xbb::RotationY Ry(rotY); xbb::RotationZ Rz(rotZ);  Before  After XBB 16 Bytes of Stack No overhead to initialize 4x4 elements that are known to be 0 or 1 for each type of transform sin(angle) cosine(angle) sine(angle) cosine(angle)
  32. 32. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Rotation Representation struct RotationX { double cosineOfAngle; double sineOfAngle; … }; 32  Stores the sine and cosine of the angle, not the angle itself.  Provides methods for element level access to a 4x4 matrix – Return known constant values double e10() const { return 0.0; } double e11() const { return cosineOfAngle; } double e12() const { return sineOfAngle; } double e13() const { return 0.0; } double e20() const { return 0.0; } double e21() const { return -sineOfAngle; } double e22() const { return cosineOfAngle; } double e23() const { return 0.0; } double e30() const { return 0.0; } double e31() const { return 0.0; } double e32() const { return 0.0; } double e33() const { return 1.0; } double e00() const { return 1.0; } double e01() const { return 0.0; } double e02() const { return 0.0; } double e03() const { return 0.0; } Rotate X axis(angle) [1][0][0][0] [0][c][s][0] [0][-s][c][0] [0][0][0][1]
  33. 33. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. XBB Multiply ref::Matrix4x4 SxSH; SxSH = S*SH; 33 auto SxSH = S*SH; xbb::Matrix4x3 SxSH_Matrix; SxSH.to(SxSH_Matrix);  Before  After XBB No Math is performed. Instead, a new type Multiply<Scale, Shear3> is returned Math is deferred until you explicitly export to a general purpose matrix. XBB’s Multiply uses the Constancy of its template parameters to define its own Constancy values
  34. 34. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Multiplication Chains ref::Matrix4x4 jointLocalSpace; jointLocalSpace = S*SH*Rz*Ry*Rx*T; 34 xbb::Matrix4x3 jointLocalSpace; (S*SH*Rz*Ry*Rx*T).to(jointLocalSpace);  Before  After XBB Confirmed assembly has minimum math operations 5 matrix multiplications: 320 multiplications 240 adds Speedup 2.45x Multiply<Multiply<Multiply<Multiply<Multiply<Scale, Shear3>, RotationZ>, RotationY>, RotationX>, Translation>
  35. 35. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Deferred Evaluation (reduce) 35 typedef ReducedMatrix < c00, c01, c02, c03, c10, c11, c12, c13, c20, c21, c22, c23, c30, c31, c32, c33 > ReducedType;  ReducedMatrix based on a transform’s Constancy. – Only has data members for NotConstant matrix elements  Multiply’s reduce recursively expands its left and right operands – Expands out entire multiplication chain  4x4 elements setByMatrixMultiply – Actually multiplies a column by row – Knows Constancy of the elements from reduced left and right transforms  Using template specialization based on the Constancy – Only exact terms necessary are accessed – Emits only necessary multiplications & additions ReducedType Multiply::reduce() const { const auto tl = left.reduce(); const auto tr = right.reduce(); ReducedType r; r.setByMatrixMultiply<0,0>(tl,tr); r.setByMatrixMultiply<0,1>(tl,tr); ... r.setByMatrixMultiply<3,3>(tl,tr); return r; }
  36. 36. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Many Hierarchy operations change only Translation of a Joint. – If we could cache the Rotation transforms, then many expensive sin/cos calls could be avoided. – Matrix4x4 is too big (128 bytes) to cache one for each Rotation X, Y, and Z.  XBB rotations are only 16 bytes each – Small enough to cache inside the Joint object XBB: Cached Rotations (S*SH*cached.Rz*cached.Ry*cached.Rx*T).to(jointLocalSpace); Use Cached Sin/Cos of Angles Speedup 12.71x 36
  37. 37. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Identity is free in any multiplication chain – Optimized out entirely – Only 1 byte of stack space (empty struct)  Transpose is free in any multiplication chain – Deferred evaluation pulls results out in different order – No additional math or data movement XBB Identity & Transpose Identity id; (S*SH*id*R*T).to(result); 37 (S*SH*R*T).transpose().(result);
  38. 38. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Inverse is very expensive – Determinant – Cofactor – Transpose – Division – scalar matrix multiply Before: Inverse of (Scale*Shear) inverseOfSxSH = (S*SH).inverse(); 38
  39. 39. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. (S*SH).inverse().to(inverseOfSxSH);  MAGIC happens – Inverse becomes part of deferred evaluation!  Because we have a representation of the multiplication chain – we can move the inverse inside the multiplication chain and reverse its order  Inverse of most transform primitives is free – except Scale which costs 3 divisions  During deferred evaluation – the logical 4x4 matrix values are reordered and flip signs where needed to represent its inverse (SH.inverse()*S.inverse()).to(inverseOfSxSH); Speedup 6.43x 39 After XBB: Inverse of (Scale*Shear)
  40. 40. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Provide template specializations for adapters to map between DWA math classes and XBB’s. – Allows XBB deferred evaluation directly into DWA matrix types  In many scenarios, the transforms could have been Identity based on logic inside the Joint. – To take full advantage of XBB, we needed to know the exact type of transforms of involved.  Templatized Hierarchy algorithm making conditional logic controlled by template parameters. e.g. – Order of Rotations – Scale Propagation Mode  Specialized templates based on parameters to – Use the correct type of XBB transform  Identity whenever possible – Multiply the Rotations in the correct order XBB Integration to DWA Motion System 40
  41. 41. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Built a jump table with instances of the algorithm for all the different combinations of options and rotation orders. – Used enums as indexes into multi-dimensional array of function pointers to the corresponding algorithm instance to execute.  Used XBB for decomposing World Space Matrix4x4 into individual Joint attributes.  Rewrote expensive “hier_apply_fk_around_pivot” with XBB directly vs. going through Hierarchy object – Avoid high overhead of building Hierarchy on on the fly  Performed non XBB related optimizations – Reduced dynamic memory allocation by replacing local std::vector<T> with stack based array when possible XBB Integration to DWA Motion System (continued…) 41
  42. 42. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Before  After XBB DWA Motion System Results Overall Speedup 1.2x 42 hier_apply_fk_around_pivot Speedup 2.8x Motion System Speedup 1.6x
  43. 43. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Reducing the Critical Path helped Thread Scaling. 43 XBB DWA Motion System Scaling Reached goal of 30 fps on single Avoton cartridge
  44. 44. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Good way to improve the impact of vectorization or threading is to reduce the amount of work being done outside those data parallel regions. – Ideally do less work in the first place.  Complex optimization problems can be represented in C++ and presented back to the compiler in a form it can excel at optimizing. – Expanding math by hand is untenable.  You can do much more with C++11/14 to encapsulate problems while retaining the original high level algorithm – Look for optimization problems that might be representable at a higher level. Call to Action 44
  45. 45. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  XBB has exactly the features required to support the DWA Motion System.  For general purpose use – more transformations and math operations might be required. e.g.  Inverse of general 4x4 matrix  Single precision version or template based data type  XBB can be licensed or potentially open sourced upon request. – Could be of use to CAD, Animation Tools, and Gaming.  Contact Alex Wells (alex.m.wells@intel.com) Future Work 45
  46. 46. C o p y r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .

×