Adler-32

Adler-32 is a hash function developed by Mark Adler . It is a modification of the Fletcher checksum. Calculates the value of the checksum in accordance with RFC 1950 for an array of bytes or its fragment. This checksum calculation algorithm differs from CRC32 in performance. Adler-32 is used in the Zlib library. Rolling checksum version of the function is used in rsync utility.

Content

History

As in the case of the Fletcher checksum, when developing the Adler sum, the task was to obtain a checksum with an error detection efficiency comparable to CRC . Although Adler and Fletcher checksum errors are almost the same as relatively weak CRCs, in some important cases they behave much worse than good CRCs.

Algorithm

The Adler-32 checksum is obtained by calculating two 16-bit checksums A and B and concatenating their bits into a 32-bit integer. A is the sum of all bytes in the string plus one, and B is the sum of all the individual values of A at each step. At the beginning of the execution of the Adler-32 function, A is initialized to one, and B to zero. Amounts are taken modulo 65521 (the largest prime number less than 2 ¹⁶ ). Bytes are written in network order , B takes 2 high bytes.

A function can be expressed as:

  A = 1 + D ₁ + D ₂ + ... + D _n (mod 65521)
  B = (1 + D ₁ ) + (1 + D ₁ + D ₂ ) + ... + (1 + D ₁ + D ₂ + ... + D _n ) (mod 65521)
    = n × D ₁ + ( n -1) × D ₂ + ( n -2) × D ₃ + ... + D _n + n (mod 65521)

  Adler-32 ( D ) = B × 65536 + A

where D is a string of bytes for which the checksum should be calculated, and n is the length of D.

Example

The Adler-32 value for the ASCII line of the "Wikipedia" is calculated as follows:

  ASCII code AB
    (in decimal)
    W: 87 1 + 87 = 88 0 + 88 = 88
    i: 105 88 + 105 = 193 88 + 193 = 281
    k: 107 193 + 107 = 300 281 + 300 = 581
    i: 105 300 + 105 = 405 581 + 405 = 986
    p: 112 405 + 112 = 517 986 + 517 = 1503
    e: 101 517 + 101 = 618 1503 + 618 = 2121
    d: 100 618 + 100 = 718 2121 + 718 = 2839
    i: 105 718 + 105 = 823 2839 + 823 = 3662
    a: 97 823 + 97 = 920 3662 + 920 = 4582

    A = 920 = 398 hex (base 16)
    B = 4582 = 11E6 hex

    Output = 300286872 = 11E60398 hex

The modulo addition operation has no effect in this example, since none of the values reached 65521.

Fletcher Checksum Comparison

The Adler checksum calculation algorithm is identical to the Fletcher calculation algorithm, with some differences. The first difference is that in the case of the Adler function, the sum A is initialized with the value 1. The second difference between the two algorithms is that the sum in the Adler-32 algorithm is calculated modulo a prime number 65521, when as Fletcher sums are calculated modulo $2^{4}$ ${\ displaystyle 2 ^ {4}}$ -one, $2^{8}$ ${\ displaystyle 2 ^ {8}}$ -one, $2^{16}$ ${\ displaystyle 2 ^ {16}}$ -1 (depending on the number of bits used), which are compound numbers . This change in the algorithm was made in order to achieve better bit mixing. Using a prime allows the Adler-32 function to notice differences in some byte combinations that the Fletcher function cannot capture. The third difference, which most strongly affects the speed of the algorithm, is that the sums of the Adler-32 function are calculated over 8-bit, not 16-bit words, which leads to a doubling of the number of iterations of the loop. This leads to the fact that the calculation of the Adler-32 checksum takes from one and a half to two times more amount of time than the Fletcher checksum for data divided into 16-bit words. For data broken by bytes, Adler-32 is faster than the Fletcher algorithm.

Although the Adler checksum is not officially defined for other data word lengths, you can use the largest prime integer less than 2 ⁴ = 16 and 2 ⁸ = 256 to implement 8- and 16-bit Adler checksums for comparison purposes. Having a similar algorithm, the Adler checksum has similar performance with the Fletcher sum. All 2-bit errors were found for data words with a length less than M * (k / 2) bits, where k is the size of the checksum, M is equal to the modulus of the Adler sum. As with the Fletcher checksum, the worst-case probability of an undetected error is observed for the same number of zeros and ones in each data block. Adler-8 and Adler-16 detect all group errors less than k / 2 bits long. Adler-32 detects all group errors of no more than 7 bits in length. Figure 1 shows the dependence of the probability of undetected errors for Adler and Fletcher checksums for a bit error rate of 10 ⁻⁵ .

Figure 1 The probability of undetected errors for Adler and Fletcher checksums for a bit error rate of 10 ⁻⁵ .

The better bit mixing provided by the Adler checksum should have resulted in better error search rates than the Fletcher sum. But, as RFC 3385 shows, Fletcher-32 performs better than 8 KB Adler-32. The Adler checksum is superior to the Fletcher sum only in the case of 16-bit checksums, and only in the area of these sums where the Hamming distance is 3. The problem is that, despite the fact that the prime number used as the operation unit , leads to better bit mixing, resulting in fewer valid checksum values available for codewords. In most cases, this negates the positive effect of better mixing. Thus, the Fletcher checksum exceeds the Adler amount in all cases except the Adler-16 amount applied to short data words. Even an increase in the efficiency of error search may not be worth the increase in computational overhead resulting from the use of modular operations.

Comparison with other checksums

RFC 3385 authors compared error detection performance. The summary of their results is presented in the table:

Algorithm	d	Block	i / byte	T _size	T-look	P _udb	P _uds
Adler-32	3	2 ¹⁹	3	-	-	10 ⁻³⁶	10 ⁻³⁵
Flatcher-32	3	2 ¹⁹	2	-	-	10 ⁻³⁷	10 ⁻³⁶
IEEE-802	3	2 ¹⁶	2.75	2 ¹⁸	0.5 / b	10 ⁻⁴¹	10 ⁻⁴⁰
CRC32C	3	2 ³¹ −1	2.75	2 ¹⁸	0.5 / b	10 ⁻⁴¹	10 ⁻⁴⁰

In the table: d - minimum distance on the block of length of Block, Block - length of the block in bits, i / byte - number of program instructions per byte, T _size - size of the table (if viewing is necessary), T-look - number views per byte, P _udb is the probability of undetected group errors, P _uds is the probability of undetected single errors.
The probabilities of undetected errors in the table above are calculated under the assumption of uniform distribution of data.

Advantages and disadvantages

A “good” hash function has a more or less uniform distribution of calculated values. Obviously, Adler-32 does not satisfy this requirement for short data (the maximum value of A for a 128-byte message is 32640, which is less than 65521 - the number by which the module operation is taken). Due to this drawback, the SCTP protocol developers preferred CRC32 to this algorithm, since the network protocol requires hashing of short sequences of bytes.

Just like for CRC32 , for Adler-32 you can easily construct a collision , that is, for a given hash, find other source data that has the same function value.

It has an advantage over CRC32 in that it is faster calculated by software.

C implementation example

A simple C implementation of the algorithm is the following code:

  uint32_t adler32 ( const unsigned char * buf , size_t buf_length )
 {
     uint32_t s1 = 1 ;
     uint32_t s2 = 0 ;
  
     while ( buf_length - )
     {
         s1 = ( s1 + * ( buf ++ ) ) % 65521 ;
         s2 = ( s2 + s1 ) % 65521 ;
     }
     return ( s2 << 16 ) + s1 ;
  }

See the zlib library code for an effective implementation.

Adler-32 in other programming languages

In Java, there is a class java.util.zip.Adler32 that implements the algorithm. ^[one]
In Perl, there is Digest :: Adler32, see CPAN . ^[2]
Private implementation for Python . ^[3]
In PHP, available through the hash () function
Erlang has several built-in functions (BIF) adler32 (...)
For Go is available in the standard hash / adler32 library

Notes

↑ Adler32 (Java Platform SE 8)
↑ Digest :: Adler32 on CPAN
↑ Adler32 code Archived November 4, 2007.

Literature

Theresa C. Maxino, Philip J. Koopman. The Effectiveness of Checksums for Embedded Control Networks // IEEE Trans. on Dependable and Secure Computing. - 2009 .-- S. 59-72 .
Theresa Maxino. Revisiting Fletcher and Adler Checksums // DSN 2006 Student Forum. - 2006.

[1] Adler32 (Java Platform SE 8)

[2] Digest :: Adler32 on CPAN

[3] Adler32 code Archived November 4, 2007.