Uniqueness distance

The distance of uniqueness (in cryptology) is the number of ciphertext characters at which the conditional informational entropy of the key (and, consequently, the plaintext ) is zero, and the key itself is determined uniquely.

Achieving the distance of uniqueness does not mean that the key (or plaintext) can be found in practice, since the definition does not take into account the practical computability of the key, but only postulates that it can be found, for example, by exhaustive search .

Content

Definition

We define the key reliability function through the conditional informational entropy of the key $Z$ ${\ displaystyle Z}$ and ciphertext characters $y_{1},y_{2},\dots ,y_{n}$ ${\ displaystyle y_ {1}, y_ {2}, \ dots, y_ {n}}$ cryptanalyst intercepts:

f_{0}=H\left(Z\right)

{\ displaystyle f_ {0} = H \ left (Z \ right)}

f_{1}=H\left(Z|y_{1}\right)

{\ displaystyle f_ {1} = H \ left (Z | y_ {1} \ right)}

f_{2}=H\left(Z|y_{1}y_{2}\right)

{\ displaystyle f_ {2} = H \ left (Z | y_ {1} y_ {2} \ right)}

\dots

{\ displaystyle \ dots}

f_{n}=H\left(Z|y_{1}y_{2}\dots y_{n}\right)

{\ displaystyle f_ {n} = H \ left (Z | y_ {1} y_ {2} \ dots y_ {n} \ right)}

f_{n}\leq f_{n-1}\leq \dots \leq f_{1}\leq f_{0}

{\ displaystyle f_ {n} \ leq f_ {n-1} \ leq \ dots \ leq f_ {1} \ leq f_ {0}}

This number of intercepted characters $u$ ${\ displaystyle u}$ at which $f_{u}=0$ ${\ displaystyle f_ {u} = 0}$ and is called the distance of uniqueness.

Approximate Formula

The derivation of the uniqueness distance formula is possible for some “good” cryptosystem, for which the informational entropy of the ciphertext has certain properties of “linearity”:

H\left(Y\right)=N\log L

{\ displaystyle H \ left (Y \ right) = N \ log L}

Where

N

{\ displaystyle N}

- the total number of characters of the ciphertext message

L

{\ displaystyle L}

- the ciphertext alphabet, for simplicity taken equal, including for plain text, and for the encryption key

H\left(y_{1}\dots y_{n}\right)\approx n\log L

{\ displaystyle H \ left (y_ {1} \ dots y_ {n} \ right) \ approx n \ log L}

H\left(y_{1}\dots y_{n}|Z\right)\approx {n \over N}H\left(X\right)

{\ displaystyle H \ left (y_ {1} \ dots y_ {n} | Z \ right) \ approx {n \ over N} H \ left (X \ right)}

the last expression is the "linearization" of the expression

H\left(Y|Z\right)=H\left(X\right)

{\ displaystyle H \ left (Y | Z \ right) = H \ left (X \ right)}

Then from the expressions for joint informational entropy:

H\left(y_{1}\dots y_{n};Z\right)=H\left(y_{1}\dots y_{n}\right)+H\left(Z|y_{1}\dots y_{n}\right)\approx n\log L+f_{n}

{\ displaystyle H \ left (y_ {1} \ dots y_ {n}; Z \ right) = H \ left (y_ {1} \ dots y_ {n} \ right) + H \ left (Z | y_ {1 } \ dots y_ {n} \ right) \ approx n \ log L + f_ {n}}

H\left(y_{1}\dots y_{n};Z\right)=H\left(Z\right)+H\left(y_{1}\dots y_{n}|Z\right)\approx H\left(Z\right)+{n \over N}H\left(X\right)

{\ displaystyle H \ left (y_ {1} \ dots y_ {n}; Z \ right) = H \ left (Z \ right) + H \ left (y_ {1} \ dots y_ {n} | Z \ right ) \ approx H \ left (Z \ right) + {n \ over N} H \ left (X \ right)}

n\log L+f_{n}\approx H\left(Z\right)+{n \over N}H\left(X\right)

{\ displaystyle n \ log L + f_ {n} \ approx H \ left (Z \ right) + {n \ over N} H \ left (X \ right)}

n\approx {{H\left(Z\right)-f_{n}} \over {\log L-H\left(X\right)/N}}={{H\left(Z\right)-f_{n}} \over {\log L\cdot \left({1-{{H\left(X\right)} \over {N\log L}}}\right)}}

{\ displaystyle n \ approx {{H \ left (Z \ right) -f_ {n}} \ over {\ log LH \ left (X \ right) / N}} = {{H \ left (Z \ right) -f_ {n}} \ over {\ log L \ cdot \ left ({1 - {{H \ left (X \ right)} \ over {N \ log L}}} \ right)}}}

Then according to the definition of the distance of uniqueness as $u:f_{u}=0$ ${\ displaystyle u: f_ {u} = 0}$ :

u\approx {H\left(Z\right) \over {\log L-H\left(X\right)/N}}={H\left(Z\right) \over {\log L\cdot \left({1-{{H\left(X\right)} \over {N\log L}}}\right)}}

{\ displaystyle u \ approx {H \ left (Z \ right) \ over {\ log LH \ left (X \ right) / N}} = {H \ left (Z \ right) \ over {\ log L \ cdot \ left ({1 - {{H \ left (X \ right)} \ over {N \ log L}}} \ right)}}}

Expression $\rho =1-{{H\left(X\right)} \over {N\log L}}$ ${\ displaystyle \ rho = 1 - {{H \ left (X \ right)} \ over {N \ log L}}}$ called source redundancy . If the source redundancy is zero, that is, it is impossible to determine from the plain text whether it is correct or not (it does not have checksums or signatures), then the distance of uniqueness becomes equal to infinity, and the cryptosystem is absolutely reliable.

Example

For the Russian language, redundancy is 3.5 bits per character. If a mono-alphabetic cipher is used , then the number of possible keys in it is equal $33!\approx 8,68\cdot 10^{36}$ ${\ displaystyle 33! \ approx 8.68 \ cdot 10 ^ {36}}$ , and the entropy of the key (with an equally probable choice) $H\left(Z\right)=\log _{2}33!\approx 122,7$ ${\ displaystyle H \ left (Z \ right) = \ log _ {2} 33! \ approx 122.7}$ bit .

Then the distance of uniqueness for the Russian text encrypted with a simple replacement cipher is:

u={{H\left(Z\right)} \over {\rho }}\approx 35,1

{\ displaystyle u = {{H \ left (Z \ right)} \ over {\ rho}} \ approx 35.1}

That is, if a cryptanalyst intercepts more than 35 characters of the ciphertext, this will most likely allow (for example, exhaustive search) to restore the original plaintext. If fewer characters are intercepted, the text recovery will be ambiguous (there may be several different plaintext options).

Literature

Gabidulin E.M. , Kshevetsky A.S. , Kolybelnikov A.I. Distance of uniqueness // Information security : a training manual - M .: MIPT , 2011. - 225 p. - ISBN 978-5-7417-0377-9
<a href=" https://wikidata.org/wiki/Track:Q4130888 "> </a> <a href=" https://wikidata.org/wiki/Track:Q26887236 "> </a> <a href = " https://wikidata.org/wiki/Track:Q26887244 "> </a> <a href=" https://wikidata.org/wiki/Track:Q26887242 "> </a>
G.V. Basalova The distance of uniqueness // Fundamentals of cryptography (Russian)

Links

Bruce Schneier : How to Recognize Plaintext (Crypto-Gram Newsletter December 15, 1998 )
Unicity Distance computed for common ciphers