Fast square root reverse

When calculating lighting, OpenArena (the free port of Quake III: Arena ) calculates the angles of incidence and reflection through a fast reverse square root.

A fast reverse square root (also fast InvSqrt () or 0x5F3759DF using the “magic” constant used ) is a fast approximate algorithm for calculating the inverse square root $y={\frac {1}{\sqrt {x}}}$ ${\ displaystyle y = {\ frac {1} {\ sqrt {x}}}}$ $y = {\ frac {1} {{\ sqrt {x}}}}$ for positive 32-bit floating point numbers . The algorithm uses integer operations “subtract” and “ bit shift ”, as well as fractional “subtract” and “multiply” - without slow operations “split” and “square root”. Despite “hacking” at the bit level, the approximation is continuous : close arguments give a close result. Accuracy (less than 0.2% down and never more) is not enough for real numerical calculations, but it is quite enough for three-dimensional graphics .

Algorithm

The algorithm takes a 32-bit floating point number ( single precision ) as the source data, and performs the following operations on it:

calculates half the value of a number and saves for future use
treating a 32-bit floating point as a 32-bit integer:
- shifts right one bit
- subtracts the number from the "magic" constant 5F3759DF ₁₆
(in the interpretation of the result as a 32-bit floating point at this stage, the first approximation of the inverse square root of the initial number is obtained)
computes one iteration of Newton's method to get a more accurate approximation.

Analysis and error

The structure of a 4-byte fractional is:

Sign
	Order								Mantissa
0	0	one	one	one	one	one	0	0	0	one	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	$=(1+2^{-2})\cdot 2^{-3}=0{,}15625$ ${\ displaystyle = (1 + 2 ^ {- 2}) \ cdot 2 ^ {- 3} = 0 {,} 15625}$ $=(1+2^{-2})\cdot 2^{-3}=0{,}15625$
31				24				23				sixteen				15				eight				7				0

We deal only with positive numbers (the sign bit is zero), not denormalized , not ∞ and not NaN . Such numbers in the standard form are written as 1, xxxx ₂ · 2 ^k ^[1] , and the head unit is not stored ( implicit unit ). In addition, for machine fractional numbers, the offset order : 2 ^{0 is} written as 011.1111.1 ₂ .

If a $x\in (2^{k},2^{k+1})$ ${\ displaystyle x \ in (2 ^ {k}, 2 ^ {k + 1})}$ (and therefore not an exact power of two), then the machine order of the number $x$ ${\ displaystyle x}$ equals a = 011.1111.1 ₂ + k , and the order of the number ${\frac {1}{\sqrt {x}}}$ ${\ displaystyle {\ frac {1} {\ sqrt {x}}}}$ : b = 011.1111.0 ₂ - [ k / 2] , [x] is the integer part of a number. ^[2]

Let's try to deduce k through a and b : k = a - 011.1111.1 ₂ = 111.1110.0 ₂ - 2 b . Hence, b = (1011.1101.1 ₂ - a) / 2 (for even k ).

Now let's see what happens with odd k . In this situation, b is larger by one, and both cases can be combined into a formula: b = [(1011.1110.0 ₂ - a ) / 2] .

If you add mantissa bits to the right (marked with asterisks), then the constructions b *** = [(1011.1110.0 *** 0 - a ***) / 2] and b *** = (101.1111.00 *** - [ a *** / 2]) are distinguished by one low bit of the mantissa. Which gives: the upper byte 5F, the second in the range 00 ... 3F. The remaining bits of the mantissa are selected so as to minimize the error.

Let's estimate the magic constant. Since when $x'=4x$ ${\ displaystyle x '= 4x}$ the mantissa of the result does not change; it suffices to consider two ranges: [1, 2) and [2, 4). It is easy to see that shift-subtraction is actually a rough piecewise linear approximation:

For $x\in [1,2)$ ${\ displaystyle x \ in [1,2)}$ the apparent part of the mantissa is x $x-1$ ${\ displaystyle x-1}$ , the result mantissa starts with 1.1 ₂ = 1.5 (implicit unit, binary comma , then unit coming from order), order 011.1111.0 ₂ ≡ 2 ⁻¹ = 0.5 , and ${\hat {y}}_{1}(x)=0{,}5(1{,}5+(c-1)-0{,}5(x-1))=0{,}5+0{,}5c-0{,}25x$ ${\ displaystyle {\ hat {y}} _ {1} (x) = 0 {,} 5 (1 {,} 5+ (c-1) -0 {,} 5 (x-1)) = 0 { ,} 5 + 0 {,} 5c-0 {,} 25x}$ ( c - the mantissa of the magic constant; according to the above calculations $c<1{,}5$ ${\ displaystyle c <1 {,} 5}$ ).
For $x\in [2,4)$ ${\ displaystyle x \ in [2,4)}$ - the apparent part of the mantissa is x $0{,}5x-1$ ${\ displaystyle 0 {,} 5x-1}$ , the result mantissa begins with 1.0 ₂ = 1 , the order is the same, and ${\hat {y}}_{2}=0{,}5(1+(c-1)-0{,}5(0{,}5x-1))=0{,}25+0{,}5c-0{,}125x$ ${\ displaystyle {\ hat {y}} _ {2} = 0 {,} 5 (1+ (c-1) -0 {,} 5 (0 {,} 5x-1)) = 0 {,} 25 +0 {,} 5c-0 {,} 125x}$ .
When calculated in the previous step ${\hat {y}}$ ${\ displaystyle {\ hat {y}}}$ will be less than 0.5, there is a loan from the order field, and ${\hat {y}}_{3}(x)=0{,}5-{\frac {0{,}5-{\hat {y}}_{2}}{2}}={\frac {3}{8}}+0{,}25c-{\frac {1}{16}}x$ ${\ displaystyle {\ hat {y}} _ {3} (x) = 0 {,} 5 - {\ frac {0 {,} 5 - {\ hat {y}} _ {2}} {2}} = {\ frac {3} {8}} + 0 {,} 25c - {\ frac {1} {16}} x}$ .

Easy to see that ${\hat {y}}_{1}(2)={\hat {y}}_{2}(2)$ ${\ displaystyle {\ hat {y}} _ {1} (2) = {\ hat {y}} _ {2} (2)}$ . ${\hat {y}}_{3}$ ${\ displaystyle {\ hat {y}} _ {3}}$ obviously "stitched" with ${\hat {y}}_{2}$ ${\ displaystyle {\ hat {y}} _ {2}}$ . Matching with ${\hat {y}}_{3}$ ${\ displaystyle {\ hat {y}} _ {3}}$ the result is obtained by writing the formulas for $x\in [4,8)$ ${\ displaystyle x \ in [4,8)}$ . Therefore, our piecewise linear approximation is continuous.

Our calculations are rough and approximate, so we will manage the condition: to minimize the relative error at the ends of the first segment, without taking into account the Newton method. Insofar as ${\frac {1}{\sqrt {1}}}=1$ ${\ displaystyle {\ frac {1} {\ sqrt {1}}} = 1}$ , ${\frac {1}{\sqrt {2}}}\approx 0{,}7071$ ${\ displaystyle {\ frac {1} {\ sqrt {2}}} \ approx 0 {,} 7071}$ and the linear function on the segment falls by 0.25, then ${\hat {y}}(0)=1-{\frac {(1-0{,}7071-0,25)\cdot 1}{1+0{,}7071}}\approx 0{,}9749\equiv 0{,}5+0{,}5c-0{,}25\cdot 1$ ${\ displaystyle {\ hat {y}} (0) = 1 - {\ frac {(1-0 {,} 7071-0.25) \ cdot 1} {1 + 0 {,} 7071}} \ approx 0 {,} 9749 \ equiv 0 {,} 5 + 0 {,} 5c-0 {,} 25 \ cdot 1}$ . Thus, the magic constant is a fractional number with an order of exactly 101.1111.0 ₂ ≡ 2 ⁶³ and a mantissa of approximately 1.45.

Newton's method gives $f(y)={\frac {1}{y^{2}}}-x$ ${\ displaystyle f (y) = {\ frac {1} {y ^ {2}}} - x}$ , $f'(y)=-{\frac {2}{y^{3}}}$ ${\ displaystyle f '(y) = - {\ frac {2} {y ^ {3}}}}$ and $y_{n+1}=y_{n}-{\frac {f(y_{n})}{f'(y_{n})}}={\frac {y_{n}(3-xy_{n}^{2})}{2}}$ ${\ displaystyle y_ {n + 1} = y_ {n} - {\ frac {f (y_ {n})} {f '(y_ {n})}} = {\ frac {y_ {n} (3- xy_ {n} ^ {2})} {2}}}$ . Since the function $f(y)$ ${\ displaystyle f (y)}$ decreases and is convex down; the iteration of Newton's method gives better accuracy if the initial approximation is less than the real one. So the real constant should be less than our calculated 1.45. After one step of Newton's method, the result is quite accurate (+0% −0.17%), which is quite suitable for computer graphics purposes. Such an error persists over the entire range of normalized fractional numbers.

It is not known where the constant 0x5F3759DF ≈ 1.432430 · 2 ⁶³ came from. Looking through Chris Lomont and Matthew Robertson, they found out that the best constant for a float is 0x5F375A86 ≈ 1.432450 · 2 ⁶³ , for double - 0x5FE6EB50C7B537A9. True, for double, the algorithm is meaningless (does not give a gain in accuracy compared to float). More subtle calculations, taking into account the Newton method, give the result that coincides with the Lomont's empirical constant ^[3] .

There are similar algorithms for other degrees, for example, square or cubic root ^[4] .

Implementation from Quake III: Arena

  float Q_rsqrt ( float number )
 {
	 long i ;
	 float x2 , y ;
	 const float threehalfs = 1.5F ;

	 x2 = number * 0.5F ;
	 y = number ;
	 i = * ( long * ) & y ;  // evil floating point bit level hacking
	 i = 0x5f3759df - ( i >> 1 );  // what the fuck?
	 y = * ( float * ) & i ;
	 y = y * ( threehalfs - ( x2 * y * y ) );  // 1st iteration
 // y = y * (threehalfs - (x2 * y * y));  // 2nd iteration, this can be removed

	 return y ;
 }

History

The algorithm was probably developed at Silicon Graphics in the 1990s, and the implementation appeared in 1999 in the source code of the Quake III Arena computer game, but this method did not appear on publicly accessible forums, such as Usenet , until 2002–2003. The algorithm generates fairly accurate results using the unique first approximation of Newton's method . At that time, the main advantage of the algorithm was the rejection of expensive floating-point computational operations in favor of integer operations. Inverse square roots are used to calculate the angles of incidence and reflection for lighting and shading in computer graphics.

The algorithm was originally attributed to John Carmack , but he admitted that Michael Abrash had brought him to id Software . The study of the question showed that the code had deeper roots in both the hardware and software areas of computer graphics. Corrections and changes were made by both Silicon Graphics and 3dfx Interactive , while the implementation of Gary Tarolly for SGI Indigo is known as the earliest use.

With the release of the 1998 instruction set 3DNow! In the AMD processors, an assembler instruction PFRSQRT appeared for a quick approximate calculation of the inverse square root. You can also get a constant for double precision numbers , but this is meaningless, since the accuracy of the calculations will not increase, so the instructions for calculating the fast inverse square root for double precision numbers have not been added.

Motivation

The surface of the normals are widely used in calculations of lighting and shading, requiring the calculation of the norms for vectors. This shows the field of surface normal vectors.

The inverse square root of a floating-point number is used to calculate the normalized vector . Since a 3D graphics program uses these normalized vectors to determine lighting and reflection , millions of these roots must be calculated in a second. Before special hardware was created to handle transformations and lighting , computing software could be slow. In particular, in the early 1990s, when the code was developed, most floating-point calculations lagged behind operations with integers in performance.

To normalize a vector, you must first determine its length by calculating its norm : the square root of the sum of the squares of the vector components , and then divide each component of the vector by the resulting length. It turns out a new vector, called the unit vector, and directed in the same direction as the original vector.

\|{\boldsymbol {v}}\|={\sqrt {{v_{1}}^{2}+{v_{2}}^{2}+{v_{3}}^{2}}}

{\ displaystyle \ | {\ boldsymbol {v}} \ | = {\ sqrt {{v_ {1}} ^ {2} + {v_ {2}} ^ {2} + {v_ {3}} ^ {2 }}}} Is the Euclidean norm of a vector, the analogue of the Euclidean distance between two points in space .

{\boldsymbol {\hat {v}}}={\boldsymbol {v}}/\|{\boldsymbol {v}}\|

{\ displaystyle {\ boldsymbol {\ hat {v}}} = {\ boldsymbol {v}} / \ | {\ boldsymbol {v}} \ |}

Is a normalized unit vector. Computed

{v_{1}}^{2}+{v_{2}}^{2}+{v_{3}}^{2}

{\ displaystyle {v_ {1}} ^ {2} + {v_ {2}} ^ {2} + {v_ {3} ^ {2}}

denote as

{\boldsymbol {x}}

{\ displaystyle {\ boldsymbol {x}}}

,

{\boldsymbol {\hat {v}}}={\boldsymbol {v}}/{\sqrt {x}}

{\ displaystyle {\ boldsymbol {\ hat {v}}} = {\ boldsymbol {v}} / {\ sqrt {x}}}

, determines the ratio of the unit vector and the inverse square root of

x

{\ displaystyle x}

.

Quake III Arena uses a fast square-root algorithm to speed up graphics processing by computational blocks, but since then the algorithm has already been implemented in some specialized hardware vertex shaders using special programmable matrices (FPGA).

Notes

↑ A common record of machine fractional and as 1, xxxx ₂ · 2 ^k , and 0,1xxxx ₂ · 2 ^{k + 1 is} distributed. We will stick to the first option.
↑ In binary numbers, dots stand on the boundaries of nibbles — in particular, the order of the number consists of 7 bits of the upper byte and one bit of the second byte.
↑ Schwydke is counting the square root of the wrapped square in the Victorian magical Constant - analogous piddhіd
↑ Hummus and Magnets

Links

Magical Square Root Implementation In Quake III
C. Lomont, Fast inverse square root , Technical Report, 2003.
Brief History of InvSqrt by Matthew Robertson
Christian Plesner Hansen 0x5f3759df,

[1] A common record of machine fractional and as 1, xxxx ₂ · 2 ^k , and 0,1xxxx ₂ · 2 ^{k + 1 is} distributed. We will stick to the first option.

[2] In binary numbers, dots stand on the boundaries of nibbles — in particular, the order of the number consists of 7 bits of the upper byte and one bit of the second byte.

[3] Schwydke is counting the square root of the wrapped square in the Victorian magical Constant - analogous piddhіd

[4] Hummus and Magnets