SlideShare a Scribd company logo
Behind the Performance of
Quake 3 Engine:
Fast Inverse Square Root
Maksym Zavershynskyi
Quake 3 Arena

First Person Shooter
Released: 1999
Engine:
Id Tech 3
Average reviewers
score:
~9/10
Architecture
• C-Language
• Client-Server separation
• Virtual Machine
• Local C Compiler for Scripts
• Highly Optimized Code
Shading
Creates the depth of perception
Material Based Shading

+

=

[1]
What makes a nice picture?
•Shading
•Lighting
•Reflections
•...
Angle of Incidence
normal
α
greater α - darker shading
view
Vector Normalization
(x,y,z)

(a,b,c)
1
Vector Normalization
(x,y,z)

(a,b,c)
1
Fast Inverse Square
Root
Inverse Square Root

float Q_rsqrt( float number )
{
return 1.0f/sqrt(number);
}
Fast Approximate
Inverse Square Root
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y;

// evil floating
//point bit level hacking
// what the f☀✿k?

i

//

= 0x5f3759df - ( i >> 1 );

y
y
y

= * ( float * ) &i;
= y * ( threehalfs - ( x2 * y * y ) );
= y * ( threehalfs - ( x2 * y * y ) );

return y;
}

// 1st iteration
// 2nd iteration,
//this can be removed
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;

(1)
(2)
(1)
(3)
//

x2
y
i
i

=
=
=
=

number * 0.5F;
number;
* ( long * ) &y;
0x5f3759df - ( i >> 1 );

y
y
y

= * ( float * ) &i;
= y * ( threehalfs - ( x2 * y * y ) );
= y * ( threehalfs - ( x2 * y * y ) );

// evil floating point bit level hacking
// what the f☀✿k?
// 1st iteration
// 2nd iteration, this can be removed

return y;
}

(1)Interpret float as integer
(2)Good initial guess with magic number 0x5f3759df
(3)One iteration of Newton’s approximation
(1)Interpret float as integer
32-bit float:
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E

M

0.15625 which is 1.01x2-3 in binary
E=-3+127=124 or 01111100 in binary
M=.01
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E → E/2
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E → E/2
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the magic number 0x5f3759df
0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

0x5f3759df - (i>>1)
0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

result: 2.614 (exact value 1/sqrt(x)=2.52982..)
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the magic number 0x5f3759df
0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

0x5f3759df - (i>>1)
0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

result: 2.614 (exact value 1/sqrt(x)=2.52982..)
(2)Magic Number: 0x5f3759df

•Gives a good initial guess.
•Minimizes the relative error.
•Trying to find a better number that minimizes
the error of initial guess we come up with:
0x5f37642f

[4]
(2)Magic Number: 0x5f3759df

•Gives a good initial guess.
•Minimizes the relative error.
•Trying to find a better number that minimizes
the error of initial guess we come up with:
0x5f37642f
Did we find a better magical number? ;)

[4]
(3)One iteration of Newton’s method
Newton’s method:
Given a suitable approximation yn to the root of f(y),
gives a better one yn+1 using

root
(3)One iteration of Newton’s method
Newton’s method:
Given a suitable approximation yn to the root of f(y),
gives a better one yn+1 using

In our case:

y

= y * ( 1.5f - ( 0.5f * x * y * y ) );
(3)One iteration of Newton’s method
After one iteration of Newton’s method
our magic number 0x5f37642f gives worse approximation
than the original magic number 0x5f3759df !!! [4]
Open Question:
How was the original magic number derived?
Open Question:
How was the original magic number 0x5f3759df derived?

•Lomont in 2003 numerically found a slightly better
magic number 0x5f375a86

[4]

•Robertson in 2012 analytically found the same
better magic number 0x5f375a86

[3]
How good?
Max relative error: 0.177%

[3]

With the 2nd iteration of Newton’s method: 0.00047% [3]
In 1999: ???

How fast?

Today: on CPUs 3-4 times faster
With the 2nd iteration of Newton’s method: 2-2.5 faster

[3]
Who wrote it?
Who?
John Carmack?
Lead Programmer of Quake, Doom,
Wolfenstein 3D
[8]

Michael Abrash?
Author of:
Zen of Assembly Language
Zen of Graphics Programming
Who?
John Carmack?
Lead Programmer of Quake, Doom,
Wolfenstein 3D
“...Not me, and I don’t think it is Michael (Abrash).
Terje Mathison perhaps?...”

Michael Abrash?
Author of:
Zen of Assembly Language
Zen of Graphics Programming

[8]
Who?
Terje Mathisen?
Assembly language optimization for x86
microprocessors.
“... I wrote fast & accurate invssqrt()... for a
computational fluid chemistry problem...
...The code is not the same as I wrote...”
[8]
Who?
Gary Tarolli?
Co-founder of 3dfx (predecessor of Nvidia)

[8]
Who?
Gary Tarolli?
Co-founder of 3dfx (predecessor of Nvidia)
“It did pass by my keyboard many many years ago, I
may have tweaked the hex constant a bit or so, but
other than that I can’t take credit for it, except that
I used it a lot and probably contributed to its
popularity and longevity. “
[8]
Who?
Gary Tarolli?
Co-founder of 3dfx (predecessor of Nvidia)
“It did pass by my keyboard many many years ago, I
may have tweaked the hex constant a bit or so, but
other than that I can’t take credit for it, except that
I used it a lot and probably contributed to its
popularity and longevity. “
[8]

This hack is older than 1990!!!
Who?
Cleve Moler inspiration
Founder of the first MATLAB,
one of the founders of MathWorks,
is currently a Chief Mathematician there.
Greg Walsch author (most probably)
Being working on Internet and distributed
computing technologies since before it was even
the Internet, and helping to engineer the first
WYSIWYG word processor at Xerox PARC
while at Stanford University

[9]

[9]
Who?
Inspired by Cleve Moler from the code written
by Velvel Kahan and K.C. Ng at Berkeley around
1986!!!
http://www.netlib.org/fdlibm/e_sqrt.c

[10]
Finally
It is Fast:

3-4 faster than the straightforward code

It is Good:

0.17% maximum relative error

It can be Improved
Dates back in 1986
Thank you!
http://zavermax.github.io
Some literature here
Quake 1,3 Architecture
1)

Fabien Sanglard, Quake 3 source code review. 2012 http://fabiensanglard.net/quake3/

2)

Michael Abrash, Ramblings in Realtime http://www.bluesnews.com/abrash/

Inverse Square Root
3)

Matthew Robertson, A Brief History of InvSqrt. 2012 Bachelor’s Thesis. Brunswick, Germany

4)

Chris Lomont, Fast Inverse Square root, Indiana: Purdue University, 2003

5)

Jim Blinn, Floating-point tricks, IEEE Comp. Graphics and Applications 17, no 4, 1997

6)

David Elbery, Fast Inverse square root (Revisited), Geometric Tools, LLC, 2010

7)

Charles McEniry, The Mathematics Behind the Fast Inverse Square Root Function Code, 2007

Investigation of the Authorship
8)

Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() 2006 http://www.beyond3d.com/content/articles/8/

9)

Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() - Part Two 2007 http://www.beyond3d.com/content/articles/15/

10)

http://blogs.mathworks.com/cleve/2012/06/19/symplectic-spacewar/#comment-13

Additional
11)

http://en.wikipedia.org/wiki/Fast_inverse_square_root

12)

https://github.com/id-Software/Quake-III-Arena

More Related Content

What's hot

Discord
DiscordDiscord
Discord
EricWu225
 
[PandoraCube] 게임 디자인 원리
[PandoraCube] 게임 디자인 원리[PandoraCube] 게임 디자인 원리
[PandoraCube] 게임 디자인 원리
PandoraCube , Sejong University
 
SE Computer, Programming Laboratory(210251) University of Pune
SE Computer, Programming Laboratory(210251) University of PuneSE Computer, Programming Laboratory(210251) University of Pune
SE Computer, Programming Laboratory(210251) University of Pune
Bhavesh Shah
 
NVIDIA Cloud Gaming
NVIDIA Cloud GamingNVIDIA Cloud Gaming
NVIDIA Cloud Gaming
Phil Eisler
 
The Art of Game Design 도서 요약 - Part 1 (원론편) : 디자이너는 경험을 만들어 낸다
The Art of Game Design 도서 요약 - Part 1 (원론편) : 디자이너는 경험을 만들어 낸다The Art of Game Design 도서 요약 - Part 1 (원론편) : 디자이너는 경험을 만들어 낸다
The Art of Game Design 도서 요약 - Part 1 (원론편) : 디자이너는 경험을 만들어 낸다
Harns (Nak-Hyoung) Kim
 
Roblox
RobloxRoblox
Roblox
Pandipro
 
First-person Shooters
First-person ShootersFirst-person Shooters
First-person Shooters
Minh Múi Mít
 
Microsoft surface
Microsoft surfaceMicrosoft surface
Microsoft surface
sandrarachel
 
게임 디자이너와 게임 서버
게임 디자이너와 게임 서버게임 디자이너와 게임 서버
게임 디자이너와 게임 서버
ByungChun2
 
Project presentation FPS
Project presentation FPSProject presentation FPS
Project presentation FPS
Shubham Rajput
 
Programming in Python
Programming in Python Programming in Python
Programming in Python
Tiji Thomas
 
INTRODUCTION OF GAME DESIGN AND DEVELOPMENT
INTRODUCTION OF GAME DESIGN AND DEVELOPMENTINTRODUCTION OF GAME DESIGN AND DEVELOPMENT
INTRODUCTION OF GAME DESIGN AND DEVELOPMENT
Laili Farhana M.I.
 
2000 nissan-pathfinder
2000 nissan-pathfinder2000 nissan-pathfinder
2000 nissan-pathfinder
Diego Ritornello
 

What's hot (13)

Discord
DiscordDiscord
Discord
 
[PandoraCube] 게임 디자인 원리
[PandoraCube] 게임 디자인 원리[PandoraCube] 게임 디자인 원리
[PandoraCube] 게임 디자인 원리
 
SE Computer, Programming Laboratory(210251) University of Pune
SE Computer, Programming Laboratory(210251) University of PuneSE Computer, Programming Laboratory(210251) University of Pune
SE Computer, Programming Laboratory(210251) University of Pune
 
NVIDIA Cloud Gaming
NVIDIA Cloud GamingNVIDIA Cloud Gaming
NVIDIA Cloud Gaming
 
The Art of Game Design 도서 요약 - Part 1 (원론편) : 디자이너는 경험을 만들어 낸다
The Art of Game Design 도서 요약 - Part 1 (원론편) : 디자이너는 경험을 만들어 낸다The Art of Game Design 도서 요약 - Part 1 (원론편) : 디자이너는 경험을 만들어 낸다
The Art of Game Design 도서 요약 - Part 1 (원론편) : 디자이너는 경험을 만들어 낸다
 
Roblox
RobloxRoblox
Roblox
 
First-person Shooters
First-person ShootersFirst-person Shooters
First-person Shooters
 
Microsoft surface
Microsoft surfaceMicrosoft surface
Microsoft surface
 
게임 디자이너와 게임 서버
게임 디자이너와 게임 서버게임 디자이너와 게임 서버
게임 디자이너와 게임 서버
 
Project presentation FPS
Project presentation FPSProject presentation FPS
Project presentation FPS
 
Programming in Python
Programming in Python Programming in Python
Programming in Python
 
INTRODUCTION OF GAME DESIGN AND DEVELOPMENT
INTRODUCTION OF GAME DESIGN AND DEVELOPMENTINTRODUCTION OF GAME DESIGN AND DEVELOPMENT
INTRODUCTION OF GAME DESIGN AND DEVELOPMENT
 
2000 nissan-pathfinder
2000 nissan-pathfinder2000 nissan-pathfinder
2000 nissan-pathfinder
 

Similar to Behind the Performance of Quake 3 Engine: Fast Inverse Square Root

04-logic-gates (1).ppt
04-logic-gates (1).ppt04-logic-gates (1).ppt
04-logic-gates (1).ppt
DrFarahAbbasNaser
 
Binary Mathematics Classwork and Hw
Binary Mathematics Classwork and HwBinary Mathematics Classwork and Hw
Binary Mathematics Classwork and Hw
Joji Thompson
 
3D Math Without Presenter Notes
3D Math Without Presenter Notes3D Math Without Presenter Notes
3D Math Without Presenter Notes
Janie Clayton
 
Introduction to Computing
Introduction to ComputingIntroduction to Computing
Introduction to Computing
Mark John Lado, MIT
 
LOGIC GATES - SARTHAK YADAV
LOGIC GATES - SARTHAK YADAVLOGIC GATES - SARTHAK YADAV
LOGIC GATES - SARTHAK YADAV
Deepak Yadav
 
Diving into Tensorflow.js
Diving into Tensorflow.jsDiving into Tensorflow.js
Diving into Tensorflow.js
Bill Stavroulakis
 
Bitwise
BitwiseBitwise
Bitwise
Axel Ryo
 
Lecture4 binary-numbers-logic-operations
Lecture4  binary-numbers-logic-operationsLecture4  binary-numbers-logic-operations
Lecture4 binary-numbers-logic-operations
markme18
 
06 floating point
06 floating point06 floating point
06 floating point
Piyush Rochwani
 
Number Systems
Number  SystemsNumber  Systems
Number Systems
Nasir Jumani
 
The Day You Finally Use Algebra: A 3D Math Primer
The Day You Finally Use Algebra: A 3D Math PrimerThe Day You Finally Use Algebra: A 3D Math Primer
The Day You Finally Use Algebra: A 3D Math Primer
Janie Clayton
 
Seismic data processing
Seismic data processingSeismic data processing
Seismic data processing
Amin khalil
 
Maths tips
Maths tipsMaths tips
Maths tips
HarshitParkar6677
 
Intoduction to Computer Appl 1st_coa.pptx
Intoduction to Computer  Appl 1st_coa.pptxIntoduction to Computer  Appl 1st_coa.pptx
Intoduction to Computer Appl 1st_coa.pptx
gadisaAdamu
 
21EC201– Digital Principles and system design.pptx
21EC201– Digital Principles and system design.pptx21EC201– Digital Principles and system design.pptx
21EC201– Digital Principles and system design.pptx
GobinathAECEJRF1101
 
Seismic data processing introductory lecture
Seismic data processing introductory lectureSeismic data processing introductory lecture
Seismic data processing introductory lecture
Amin khalil
 
2013 1
2013 1 2013 1
2013 1
SIVAN HASSAN
 
Number system
Number systemNumber system
Number system
Mohit Saini
 
Class 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and PoliticsClass 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and Politics
David Evans
 
DLD-Introduction.pptx
DLD-Introduction.pptxDLD-Introduction.pptx
DLD-Introduction.pptx
UzairAhmadWalana
 

Similar to Behind the Performance of Quake 3 Engine: Fast Inverse Square Root (20)

04-logic-gates (1).ppt
04-logic-gates (1).ppt04-logic-gates (1).ppt
04-logic-gates (1).ppt
 
Binary Mathematics Classwork and Hw
Binary Mathematics Classwork and HwBinary Mathematics Classwork and Hw
Binary Mathematics Classwork and Hw
 
3D Math Without Presenter Notes
3D Math Without Presenter Notes3D Math Without Presenter Notes
3D Math Without Presenter Notes
 
Introduction to Computing
Introduction to ComputingIntroduction to Computing
Introduction to Computing
 
LOGIC GATES - SARTHAK YADAV
LOGIC GATES - SARTHAK YADAVLOGIC GATES - SARTHAK YADAV
LOGIC GATES - SARTHAK YADAV
 
Diving into Tensorflow.js
Diving into Tensorflow.jsDiving into Tensorflow.js
Diving into Tensorflow.js
 
Bitwise
BitwiseBitwise
Bitwise
 
Lecture4 binary-numbers-logic-operations
Lecture4  binary-numbers-logic-operationsLecture4  binary-numbers-logic-operations
Lecture4 binary-numbers-logic-operations
 
06 floating point
06 floating point06 floating point
06 floating point
 
Number Systems
Number  SystemsNumber  Systems
Number Systems
 
The Day You Finally Use Algebra: A 3D Math Primer
The Day You Finally Use Algebra: A 3D Math PrimerThe Day You Finally Use Algebra: A 3D Math Primer
The Day You Finally Use Algebra: A 3D Math Primer
 
Seismic data processing
Seismic data processingSeismic data processing
Seismic data processing
 
Maths tips
Maths tipsMaths tips
Maths tips
 
Intoduction to Computer Appl 1st_coa.pptx
Intoduction to Computer  Appl 1st_coa.pptxIntoduction to Computer  Appl 1st_coa.pptx
Intoduction to Computer Appl 1st_coa.pptx
 
21EC201– Digital Principles and system design.pptx
21EC201– Digital Principles and system design.pptx21EC201– Digital Principles and system design.pptx
21EC201– Digital Principles and system design.pptx
 
Seismic data processing introductory lecture
Seismic data processing introductory lectureSeismic data processing introductory lecture
Seismic data processing introductory lecture
 
2013 1
2013 1 2013 1
2013 1
 
Number system
Number systemNumber system
Number system
 
Class 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and PoliticsClass 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and Politics
 
DLD-Introduction.pptx
DLD-Introduction.pptxDLD-Introduction.pptx
DLD-Introduction.pptx
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 

Behind the Performance of Quake 3 Engine: Fast Inverse Square Root

  • 1. Behind the Performance of Quake 3 Engine: Fast Inverse Square Root Maksym Zavershynskyi
  • 2. Quake 3 Arena First Person Shooter Released: 1999 Engine: Id Tech 3 Average reviewers score: ~9/10
  • 3. Architecture • C-Language • Client-Server separation • Virtual Machine • Local C Compiler for Scripts • Highly Optimized Code
  • 6. What makes a nice picture? •Shading •Lighting •Reflections •...
  • 7. Angle of Incidence normal α greater α - darker shading view
  • 11. Inverse Square Root float Q_rsqrt( float number ) { return 1.0f/sqrt(number); }
  • 12. Fast Approximate Inverse Square Root float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating //point bit level hacking // what the f☀✿k? i // = 0x5f3759df - ( i >> 1 ); y y y = * ( float * ) &i; = y * ( threehalfs - ( x2 * y * y ) ); = y * ( threehalfs - ( x2 * y * y ) ); return y; } // 1st iteration // 2nd iteration, //this can be removed
  • 13. float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; (1) (2) (1) (3) // x2 y i i = = = = number * 0.5F; number; * ( long * ) &y; 0x5f3759df - ( i >> 1 ); y y y = * ( float * ) &i; = y * ( threehalfs - ( x2 * y * y ) ); = y * ( threehalfs - ( x2 * y * y ) ); // evil floating point bit level hacking // what the f☀✿k? // 1st iteration // 2nd iteration, this can be removed return y; } (1)Interpret float as integer (2)Good initial guess with magic number 0x5f3759df (3)One iteration of Newton’s approximation
  • 14. (1)Interpret float as integer 32-bit float: 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E M 0.15625 which is 1.01x2-3 in binary E=-3+127=124 or 01111100 in binary M=.01
  • 15. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 16. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E → E/2
  • 17. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E → E/2
  • 18. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 the magic number 0x5f3759df 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0x5f3759df - (i>>1) 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 result: 2.614 (exact value 1/sqrt(x)=2.52982..)
  • 19. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 the magic number 0x5f3759df 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0x5f3759df - (i>>1) 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 result: 2.614 (exact value 1/sqrt(x)=2.52982..)
  • 20. (2)Magic Number: 0x5f3759df •Gives a good initial guess. •Minimizes the relative error. •Trying to find a better number that minimizes the error of initial guess we come up with: 0x5f37642f [4]
  • 21. (2)Magic Number: 0x5f3759df •Gives a good initial guess. •Minimizes the relative error. •Trying to find a better number that minimizes the error of initial guess we come up with: 0x5f37642f Did we find a better magical number? ;) [4]
  • 22. (3)One iteration of Newton’s method Newton’s method: Given a suitable approximation yn to the root of f(y), gives a better one yn+1 using root
  • 23. (3)One iteration of Newton’s method Newton’s method: Given a suitable approximation yn to the root of f(y), gives a better one yn+1 using In our case: y = y * ( 1.5f - ( 0.5f * x * y * y ) );
  • 24. (3)One iteration of Newton’s method After one iteration of Newton’s method our magic number 0x5f37642f gives worse approximation than the original magic number 0x5f3759df !!! [4] Open Question: How was the original magic number derived?
  • 25. Open Question: How was the original magic number 0x5f3759df derived? •Lomont in 2003 numerically found a slightly better magic number 0x5f375a86 [4] •Robertson in 2012 analytically found the same better magic number 0x5f375a86 [3]
  • 26. How good? Max relative error: 0.177% [3] With the 2nd iteration of Newton’s method: 0.00047% [3]
  • 27. In 1999: ??? How fast? Today: on CPUs 3-4 times faster With the 2nd iteration of Newton’s method: 2-2.5 faster [3]
  • 29. Who? John Carmack? Lead Programmer of Quake, Doom, Wolfenstein 3D [8] Michael Abrash? Author of: Zen of Assembly Language Zen of Graphics Programming
  • 30. Who? John Carmack? Lead Programmer of Quake, Doom, Wolfenstein 3D “...Not me, and I don’t think it is Michael (Abrash). Terje Mathison perhaps?...” Michael Abrash? Author of: Zen of Assembly Language Zen of Graphics Programming [8]
  • 31. Who? Terje Mathisen? Assembly language optimization for x86 microprocessors. “... I wrote fast & accurate invssqrt()... for a computational fluid chemistry problem... ...The code is not the same as I wrote...” [8]
  • 32. Who? Gary Tarolli? Co-founder of 3dfx (predecessor of Nvidia) [8]
  • 33. Who? Gary Tarolli? Co-founder of 3dfx (predecessor of Nvidia) “It did pass by my keyboard many many years ago, I may have tweaked the hex constant a bit or so, but other than that I can’t take credit for it, except that I used it a lot and probably contributed to its popularity and longevity. “ [8]
  • 34. Who? Gary Tarolli? Co-founder of 3dfx (predecessor of Nvidia) “It did pass by my keyboard many many years ago, I may have tweaked the hex constant a bit or so, but other than that I can’t take credit for it, except that I used it a lot and probably contributed to its popularity and longevity. “ [8] This hack is older than 1990!!!
  • 35. Who? Cleve Moler inspiration Founder of the first MATLAB, one of the founders of MathWorks, is currently a Chief Mathematician there. Greg Walsch author (most probably) Being working on Internet and distributed computing technologies since before it was even the Internet, and helping to engineer the first WYSIWYG word processor at Xerox PARC while at Stanford University [9] [9]
  • 36. Who? Inspired by Cleve Moler from the code written by Velvel Kahan and K.C. Ng at Berkeley around 1986!!! http://www.netlib.org/fdlibm/e_sqrt.c [10]
  • 37. Finally It is Fast: 3-4 faster than the straightforward code It is Good: 0.17% maximum relative error It can be Improved Dates back in 1986
  • 39. Some literature here Quake 1,3 Architecture 1) Fabien Sanglard, Quake 3 source code review. 2012 http://fabiensanglard.net/quake3/ 2) Michael Abrash, Ramblings in Realtime http://www.bluesnews.com/abrash/ Inverse Square Root 3) Matthew Robertson, A Brief History of InvSqrt. 2012 Bachelor’s Thesis. Brunswick, Germany 4) Chris Lomont, Fast Inverse Square root, Indiana: Purdue University, 2003 5) Jim Blinn, Floating-point tricks, IEEE Comp. Graphics and Applications 17, no 4, 1997 6) David Elbery, Fast Inverse square root (Revisited), Geometric Tools, LLC, 2010 7) Charles McEniry, The Mathematics Behind the Fast Inverse Square Root Function Code, 2007 Investigation of the Authorship 8) Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() 2006 http://www.beyond3d.com/content/articles/8/ 9) Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() - Part Two 2007 http://www.beyond3d.com/content/articles/15/ 10) http://blogs.mathworks.com/cleve/2012/06/19/symplectic-spacewar/#comment-13 Additional 11) http://en.wikipedia.org/wiki/Fast_inverse_square_root 12) https://github.com/id-Software/Quake-III-Arena