The Eratosphene sieve is an algorithm for finding all primes up to some integer n , which is attributed to the ancient Greek mathematician Eratosthenes Kirensky [1] . As in many cases, here the name of the algorithm speaks about the principle of its operation, that is, the sieve implies filtering , in this case filtering all numbers except simple ones. As you go through the list, the necessary numbers remain, and unnecessary (they are called composite ) are excluded.
History
The method was called the “sieve” because, according to legend, Eratosthenes wrote numbers on a wax-coated tablet and pierced holes in those places where compound numbers were written. Therefore, the plate was a kind of sieve, through which all composite numbers were “sifted”, but only simple numbers remained. Eratosthenes gave a table of prime numbers up to 1000.
Algorithm
To find all prime numbers not more than a given number n , following the method of Eratosthenes, you need to perform the following steps:
- Write out all integers from two to n (2, 3, 4, ..., n ) in a row.
- Let the variable p be initially equal to two - the first prime number.
- Cross out in the list the numbers from 2 p to n counting steps in p (these will be multiples of p : 2 p , 3 p , 4 p , ...).
- Find the first non-crossed number in the list greater than p , and assign this number to the value of the variable p .
- Repeat steps 3 and 4 as long as possible.
Now all uncrossed numbers in the list are all prime numbers from 2 to n .
In practice, the algorithm can be improved as follows. In step 3, the numbers can be crossed out starting right away from the number p 2 , because all composite numbers less than it will already be crossed out by this time. And, accordingly, the algorithm can be stopped when p 2 becomes greater than n . [2] Also, all prime numbers (except 2) are odd numbers, and therefore for them it is possible to count in steps of 2 p , starting with p 2 .
Example for n = 30
We write natural numbers, ranging from 2 to 30 in a row:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
The first number in the list, 2 - simple. Let's go through a series of numbers, striking out all the numbers that are multiples of 2 (that is, every second, starting with 2 2 = 4 ):
2 3456789101112131415161718192021222324252627282930
The next unchecked number is 3 . Let's go through the series of numbers, striking out all the numbers that are multiples of 3 (that is, every third, starting with 3 2 = 9 ):
2 3456789101112131415161718192021222324252627282930
The next unclosed number, 5 is a prime. Let's go through a series of numbers, crossing out all the numbers that are multiples of 5 (that is, every fifth, starting with 5 2 = 25 ):
2 3456789101112131415161718192021222324252627282930
The next unclosed number is 7 . Its square, 49 - more than 30 , so this work is completed. All compound numbers are already crossed out:
2 3 5 7 11 13 17 19 23 29
Pseudocode
Optimized implementation (starting with squares) in pseudocode [3] [4] :
Input : natural number n
Let A be a boolean array indexed by numbers from 2 to n ,
initially filled with true values.
for i : = 2, 3, 4, ..., while i 2 ≤ n :
if A [ i ] = true :
for j : = i 2 , i 2 + i , i 2 + 2 i , ..., while j ≤ n :
A [ j ]: = false
Output : the numbers i , for which A [ i ] = true .
Algorithm complexity
The complexity of the algorithm is operations when compiling a table of prime numbers to {\ displaystyle n} [5] .
Proof of difficulty
When selected for every simple the inner loop will be executed, which will perform action. The complexity of the algorithm can be obtained from the evaluation of the following value:
Since the number of prime numbers , smaller or equal rated as , and as a consequence, -th prime number is approximately equal to , the amount can be converted:
Here, the sum of the sum of the first prime number is separated to avoid division by zero. This amount can be estimated by the integral
The result is for the initial amount:
More rigorous proof (and giving a more accurate estimate up to constant factors) can be found in the book Hardy and Wright “An Introduction to the Theory of Numbers” [6] .
Method Modifications
Unlimited, incremental option
In this embodiment, primes are calculated sequentially, without limitation from above, [7] as numbers between the composite numbers, which are calculated for each prime number p , starting from its square, with a step of p (or for odd primes 2p ) [ 2] . It can be represented symbolically in the data flow paradigm as
primes = [ 2 ..] \ [[ p² , p² + p .] for p in primes ]
using the abstraction notation of lists , where \
denotes the difference between arithmetic progressions .
The first prime number 2 (among increasing positive integers ) is known in advance, so there is no vicious circle in this self-referential definition.
Pseudo-code of phased screening, in inefficient, for simplicity, implementation (cf. with the following options):
primes = sieve [ 2 .. ] where sieve [ p , ... xs ] = [ p , ... sieve ( xs \ [ p² , p² + p ..])]
Enumerating Dividers
The sieve of Eratosthenes is often confused with algorithms that gradually compound numbers , testing each of the divisibility candidates using one prime number at each stage. [eight]
The widely known David Turner functional code of 1975 [9] is often mistaken for Eratosthenes sieve, [8] but in reality [7] this is a non-optimal variant with brute force divisors (optimally, no dividers larger than the square root of the test number are used). In pseudocode,
primes = sieve [ 2 .. ] where sieve [ p , ... xs ] = [ p , ... sieve [ x in xs if x% p > 0 ]]
Segmented Sieve
As noted by Sorenson, the main problem of the implementation of the sieve of Eratosthenes on computers is not the number of operations performed, but the requirements for the amount of memory occupied. [4] For large values of n , the range of prime numbers may exceed available memory; even worse, even with relatively small n, using the memory cache is far from optimal, since the algorithm violating the entire array violates the principle of .
To solve the presented problem, a segmented sieve is used, in which only part of the range of primes is sifted through the iteration. [10] This method has been known since the 1970s and works as follows: [4] [11]
- We divide the range from 2 to n into segments (segments) of a certain length Δ ≤ √ n .
- Find all the prime numbers in the first segment, using the usual sieve.
- Each of the subsequent segments ends on some number m . Find all the prime numbers in the segment as follows:
- Create a boolean array of size Δ
- For each prime number p ≤ √ m among those already found, we mark in the array as “non-simple” all multiples of p , sorting out numbers with a step in p , starting with the smallest p- number in this segment.
If the number Δ is chosen equal to √ n , then the complexity of the algorithm is estimated by memory as O ( √ n ) , and the operational complexity remains the same as that of the usual Eratosthenes sieve. [eleven]
For cases when n is so large that all sifted primes smaller than √ n cannot fit in memory, as required by the segmented sieve algorithm, they use slower, but with much lower memory requirements, for example, Sorensson sieve . [12]
Euler sieve
The proof of Euler's identity for the Riemann zeta function contains a mechanism for screening compound numbers similar to that of Eratosthenes, but so that each composite number is removed from the list only once. A similar sieve is described in Gries & Misra 1978 as a sieve with linear operation time (see below ).
An initial list is compiled starting from number 2 . At each stage of the algorithm, the first number in the list is taken as the following prime number, the results of which are printed for each number in the list are marked for deletion. After that, the first number and all the marked numbers are removed from the list, and the process is repeated again:
[2] (3) 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 47 47 51 51 53 55 57 59 63 63 67 69 71 73 75 77 79 ... [3] (5) 7 11 13 17 19 23 25 29 31 35 37 41 43 47 49 53 55 59 61 65 67 73 73 77 79 ... [4] (7) 11 13 17 19 23 29 31 37 41 43 47 49 53 59 61 67 71 73 77 79 ... [5] (11) 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 ... [...]
Here is an example, starting with odd numbers, after the first stage of the algorithm. Thus, after the k- th stage, the work list contains only mutually simple numbers with first k prime numbers (that is, numbers that are not a multiple of any of the first k prime numbers), and begins with the (k + 1) -th prime number. All the numbers on the list, smaller than the square of its first number, are simple.
In pseudocode,
primes = sieve [ 2 .. ] where sieve [ p , ... xs ] = [ p , ... sieve ( xs \ [ p² , ... [ p * x for x in xs ]])]
Sieve of odd numbers only.
Since all even numbers , except 2, are composite, it is possible not to process even numbers at all, but to operate only with odd numbers. Firstly, it will allow to halve the amount of required memory. Secondly, it will reduce the number of operations performed by the algorithm (approximately twice).
This can be generalized to mutually simple numbers not only with 2 (that is, odd numbers), but also with 3, 5, and so on.
Reducing memory consumption
The Eratosthene algorithm actually operates with bits of memory. Consequently, it is possible to significantly save memory consumption by storing Boolean type variables are not as byte bit that is bytes of memory.
This approach - “bit compression” - complicates the handling of these bits. Any reading or writing of a bit will involve several arithmetic operations. But on the other hand compactness in memory is significantly improved. Longer intervals fit into a cache that works much faster than usual, so that when working on a segment-wise basis, the overall speed increases.
Linear-time Eratosthenes sieve
For each number i in the interval [2 ... n], this algorithm detects its minimal prime divisor lp [i] ( lp from English least prime ).
Also supported is a list of all primes — the pr [] array, initially empty. In the course of the algorithm, this array will be gradually filled.
Initially, all lp [i] values are filled with zeros.
Then you should go through the current number i from 2 to n . There may be two cases:
- lp [i] = 0 : this means that i is a simple number, since no other divisors have been found for it.
Therefore, you need to assign lp [i] = i and add i to the end of the pr [] list.
- lp [i] ≠ 0 : this means that the current number i is composite, and its minimal prime divider is lp [i] .
In both cases, the process of placing values in the lp [i] array begins next: you should take numbers that are multiples of i and update the value of lp [] from them. However, the main goal is to learn how to do it in such a way that, as a result, the value of lp [] would be set at most once for each number.
It is argued that for this you can do so. We consider the numbers of the form x = p, i , where p is consistently equal to all prime numbers not exceeding lp [i] (just for this it was necessary to store a list of all prime numbers).
For all numbers of this type, enter a new value lp [x] - it must be equal to p [13] .
Pseudocode
Input : natural number n Let pr be an integer array, initially empty; lp is an integer array indexed from 2 to n , filled with zeros for i : = 2, 3, 4, ..., to n : if lp [ i ] = 0 : lp [ i ]: = i pr [] + = { i } for p from pr while p ≤ lp [ i ] and p * i ≤ n : lp [ p * i ]: = p Output : all numbers in the pr array.
The complexity of the algorithm in practice
Sieve Eratosthenes is a popular way to evaluate computer performance. [14] As can be seen from the above proof of the complexity of the algorithm, getting rid of a constant and a term very close to zero (ln (ln n - ln ln n ) - ln ln 2 ≈ ln ln n ) , the time complexity of calculating all primes less than n is approximated by O ( n ln ln n ) . However, the algorithm has an exponential time complexity with respect to the size of the input data, which makes it a pseudopolynomial algorithm . The memory for the basic algorithm requires O ( n ) . [15]
The segmented version has the same operational complexity O ( n ln ln n ) , [6] . as a non-segmented version, but reduces the need for used memory to the size of a segment (the segment size is significantly smaller than the size of the whole array of primes), which is O (√ n / ln n ) . [16] There is also a very rare in practice optimized segmented sieve of Eratosthenes. It is built in O ( n ) operations and occupies O ( √ n ln ln n / ln n ) bits in memory. [15] [17] [16]
In practice, it turns out that the estimate ln ln n is not very accurate even for the maximum practical ranges such as 10 16 . [17] After reviewing the above proof of complexity , it is not difficult to understand where the inaccuracy of the assessment came from. The discrepancy with the estimate can be explained as follows: within this practical screening range, permanent displacements have a significant effect. [6] Thus, a very slowly growing term ln ln n does not become large enough so that the constants can be fairly neglected until n approaches infinity, which naturally goes beyond the boundaries of any applied sifting range. [6] This fact shows that for current input data, the performance of the sieve of Eratosthenes is much better than would be expected, using only asymptotic estimates of time complexity. [17]
It should also be noted that the sieve of Eratosthenes works faster than the Atkin sieve that is often compared with it only for values of n less than 10 10 . [18] This is true provided that the operations take approximately the same time in the CPU cycles, and this is a reasonable assumption for one algorithm that works with a huge bitmap. [19] With this assumption, it turns out that Atkin's sieve is faster than the most factorized Eratosthenes sieve for n over 10 13 , but with such sifting ranges it will be necessary to occupy a huge amount of RAM, even if the “bit packing” was used. [20] However, the section on the segmented version of the sieve of Eratosthenes shows that the assumption of maintaining equality in time spent on one operation between two algorithms is not fulfilled during segmentation. [11] [18] In turn, this leads to the fact that the Atkin sieve (non-segmented) runs slower than the Eratosthenes segmented sieve with an increase in the sifting range, by reducing the time for the operation for the second one.
Using the O large notation is also not the right way to compare practical characteristics, even for variations of the Eratosthenes sieve, since this notation ignores constants and offsets, which can be very significant for application ranges. [6] For example, one of the variations of the sieve of Eratosthenes known as Pritchard sieve [15] has an O ( n ) performance, [17] but its basic implementation requires either the “one large array” algorithm [16] (that is, using a regular array, which stores all numbers up to n ), which limits its use range to the amount of available memory, or it must be segmented to reduce memory usage. The work of Pritchard reduced the memory requirements to the limit, but the price for this improvement in memory is the complexity of the calculations, which leads to an increase in the operating complexity of the algorithms. [17]
A popular way to speed up algorithms that work with large arrays of numbers is various kinds of factorization . [21] The use of factoring methods gives a significant reduction in operational complexity by optimizing the input data set. [22] [23] For index algorithms, ring factorization is often used. [22] [24] The algorithms for finding all primes up to a given n that are considered in this article are similar to the Eratosthene sieve to index ones, which makes it possible to apply the ring factorization method to them. [25]
Despite the theoretical acceleration obtained using ring factorization , in practice there are factors that are not taken into account when calculating, but can have a significant impact on the behavior of the algorithm, which as a result may not give the expected increase in speed. [26] Consider the most significant of them:
- Multiplication and division. In analytical calculations, it is assumed that the speed of performing arithmetic operations is equivalent. But in reality this is not the case, and multiplication and division are performed much slower than addition and subtraction. Thus, this factor does not affect the sieve of Eratosthenes, since it uses only addition and subtraction, but it is very significant for Pitchard sieve (one of the results of the complexity of the calculations mentioned above). [27]
- Compiler optimization. At the compilation stage, the compiler optimizes all programs for more correct execution by the machine. But it is often very difficult to say what contribution this optimization makes, and whether this contribution will be the same for two different algorithms. [28]
- Cache The processor uses the cache to speed up the retrieval of instructions and data from memory. The presence of a cache leads to the fact that programs using localized links will work faster. But sifting algorithms for prime numbers that use high degree factorization often generate random references to memory, which reduces their performance. [28]
For clarity, the presentation of the contribution of factorization in the performance of sifting algorithms, below are two tables. The tables show the results of measuring the real time of the execution of the sieve of Eratosthenes and the sieve of Pitchard in seconds for different ranges of n and different degrees of ring factorization. E i and P i designations for the sieve of Eratosthenes and Pitchard, respectively, where the index i denotes the degree of ring factorization. It should be noted that E 0 and P 0 mean the absence of factorization. [28]
n E 0 E 1 E 2 E 3 E 4 E 5 500 0.00043 0.00029 0.00027 0.00048 0.00140 0.01035 5000 0.00473 0.00305 0.00254 0.00293 0.00551 0.03207 50,000 0.05156 0.03281 0.02617 0.02578 0.03164 0.10663 500,000 0.55817 0.35037 0.28240 0.28358 0.28397 0.47028 5,000,000 6.06719 3.82905 3.20722 3.25214 3.28104 3.38455
The table shows that the best performance has a sieve of Eratosthenes with an average degree of factorization E 2 . This fact can be explained by the influence of the cache factor, discussed above, on algorithms with a high degree of factorization.
n E 0 E 1 E 2 E 3 E 4 E 5 500 0.00147 0.00074 0.00050 0.00051 0.00095 0.00649 5000 0.01519 0.00777 0.00527 0.00453 0.00430 0.00973 50,000 0.15507 0.08203 0.05664 0.04843 0.04180 0.04413 500,000 1.60732 0.86254 0.61597 0.53825 0.47146 0.43787 5,000,000 16.47706 9.00177 6.57146 5.83518 5.27427 4.88251
In conclusion, it is worth noting that with an increase in n, the ratio of times becomes more and more in favor of the sieve of Eratosthenes, and on the range n = 5,000,000 it is steadily faster with any factorizations. This fact once again confirms the loss in Pitchhard sieve performance due to complex calculations. [17]
See also
- Sieve Sundaram
- Sieve Atkina
- Coriccia
Notes
- ↑ Nikomah Geraski , Introduction to arithmetic , I, 13. [1]
- ↑ 1 2 Horsley, Rev. Samuel, FRS, " κσκινον ρατοσθένους or," The Sieve of Eratosthenes. Philosophical Transactions (1683-1775), Vol. 62. (1772), pp. 327-3347 .
- ↑ Sedgewick, Robert. Algorithms in C ++. - Addison-Wesley, 1992. - ISBN 0-201-51059-6 . p. sixteen.
- 2 1 2 3 Wisconsin-Madison, Jonathan Sorenson, Computer Science Technical Report # 909, January 2, 2006 which square is below the upper limit, is shown.
- ↑ Pritchard, Paul, "Linear prime-number sieves: a family tree," Sci. Comput. Programming 9 : 1 (1987), pp. 17-35.
- 2 1 2 3 4 5 Hardy and Wright "An Introduction to the Theory of Numbers, p. 349
- ↑ 1 2 O'Neill, Melissa E., “The Genuine Sieve of Eratosthenes,” Journal of Functional Programming, Published online by Cambridge University Press 9 October 2008 DOI : 10.1017 / S0956796808007004 .
- ↑ 1 2 Colin Runciman, FUNCTIONAL PEARL: Lazy wheel sieves and spirals of primes , Journal of Functional Programming, Volume 7 Issue 2, March 1997; also here .
- ↑ Turner, David A. SASL language manual. Tech. rept. CS / 75/1. Department of Computational Science, University of St. Andrews 1975. (
primes = sieve [2..]; sieve (p:nos) = p:sieve (remove (multsof p) nos); remove m = filter (not . m); multsof pn = rem np==0
) - ↑ Crandall & Pomerance, Prime Numbers: A Computational Perspective , second edition, Springer: 2005, pp. 121-24.
- ↑ 1 2 3 Bays, Carter; Hudson, Richard H. The Segmented Sieve of the Eratosthenes and the Armed Forces 12 10 (Eng.) // BIT: journal. - 1977. - Vol. 17 , no. 2 - P. 121-127 . - DOI : 10.1007 / BF01932283 .
- ↑ J. Sorenson, The Pseudosquares prime sieve , Proceedings of the 7th International Symposium on Algorithmic Number Theory. (ANTS-VII, 2006).
- ↑ David Gries, Jayadev Misra. A Linear Sieve Algorithm for Finding Prime Numbers [1978]
- ↑ Peng, TA . One Million Primes Through the Sieve , BYTE (Fall 1985), p. 243–244. The appeal date is March 19, 2016.
- 2 1 2 3 Paul Pritchard, A sublinear additive for Sourcing, Communications of the ACM 24 (1981), 18-23. MR : 600730
- ↑ 1 2 3 Paul Pritchard, Fast compact prime number sieves (among others), Journal of Algorithms 4 (1983), 332-344. MR : 729229
- ↑ 1 2 3 4 5 6 Paul Pritchard, Explaining the wheel of sieve, Acta Informatica 17 (1982), 477-485. MR : 685983
- ↑ 1 2 AOL ATKIN AND DJ BERNSTEIN. PRIME SIEVES USING BINARY QUADRATIC FORMS // MATHEMATICS OF COMPUTATION.
- ↑ Meertens, Lambert. Calculating the Sieve of Eratosthenes // Journal of functional programming.
- ↑ Meertens, Lambert. Calculating the Sieve of Eratosthenes // Journal of functional programming.
- ↑ V.A. Minaev, N.P. Vasilyev, V.V. Lukyanov, S.A. Nikonov, D.V. Nikerov. [ http://vestnik-rosnou.ru/pdf/n4y2013/p29.pdf ИНДЕКСНЫЕ АЛГОРИТМЫ ВЫЧИСЛЕНИЯ ПРОСТЫХ ЧИСЕЛ С ИСПОЛЬЗОВАНИЕМ МЕТОДА КОЛЬЦЕВОЙ ФАКТОРИЗАЦИИ] // ВЕСТНИК. — 2013. — № 4 . — С. 29 .
- ↑ 1 2 Jonathan Sorenson. An Analysis of Two Prime Number Sieves // Computer Sciences Department University of Wisconsin-Madison. — С. 8—10 .
- ↑ В.А. Минаев, Н.П. Васильев, В.В. Лукьянов, С.А. Никонов, Д.В. Никеров. [ http://vestnik-rosnou.ru/pdf/n4y2013/p29.pdf ИНДЕКСНЫЕ АЛГОРИТМЫ ВЫЧИСЛЕНИЯ ПРОСТЫХ ЧИСЕЛ С ИСПОЛЬЗОВАНИЕМ МЕТОДА КОЛЬЦЕВОЙ ФАКТОРИЗАЦИИ] // ВЕСТНИК. — 2013. — № 4 . — С. 29 .
- ↑ В.А. Минаев, Н.П. Васильев, В.В. Лукьянов, С.А. Никонов, Д.В. Никеров. [ http://vestnik-rosnou.ru/pdf/n4y2013/p29.pdf ИНДЕКСНЫЕ АЛГОРИТМЫ ВЫЧИСЛЕНИЯ ПРОСТЫХ ЧИСЕЛ С ИСПОЛЬЗОВАНИЕМ МЕТОДА КОЛЬЦЕВОЙ ФАКТОРИЗАЦИИ] // ВЕСТНИК. — 2013. — № 4 . — С. 30—31 .
- ↑ В.А. Минаев, Н.П. Васильев, В.В. Лукьянов, С.А. Никонов, Д.В. Никеров. [ http://vestnik-rosnou.ru/pdf/n4y2013/p29.pdf ИНДЕКСНЫЕ АЛГОРИТМЫ ВЫЧИСЛЕНИЯ ПРОСТЫХ ЧИСЕЛ С ИСПОЛЬЗОВАНИЕМ МЕТОДА КОЛЬЦЕВОЙ ФАКТОРИЗАЦИИ] // ВЕСТНИК. — 2013. — № 4 . — С. 30—33 .
- ↑ Jonathan Sorenson. An Analysis of Two Prime Number Sieves // Computer Sciences Department University of Wisconsin-Madison. — С. 16—18 .
- ↑ Jonathan Sorenson. An Analysis of Two Prime Number Sieves // Computer Sciences Department University of Wisconsin-Madison. — С. 16 .
- ↑ 1 2 3 Jonathan Sorenson. An Analysis of Two Prime Number Sieves // Computer Sciences Department University of Wisconsin-Madison. — С. 17 .
Literature
- Эратосфеново решето // Элоквенция — Яя. — М. : Советская энциклопедия, 1957. — С. 141. — ( Большая советская энциклопедия : [в 51 т.] / гл. ред. Б. А. Введенский ; 1949—1958, т. 49).
- Гальперин Г. «Просто о простых числах» // Квант . - № 4 .
- Под. ed. Невская И.И. Неопубликованные материалы Л.Эйлера по теории чисел . — "Наука", 1997. (недоступная ссылка)
- Проф. Д.Ф.Егоров. Элементы теории чисел . — Государственное издательство Москва, 1923. (недоступная ссылка)
- Кордемский Б. А. Математическая смекалка . — М. : ГИФМЛ, 1958. — 576 с.