In the area of data compression , the Shannon code , named after its creator, Claude Shannon , is a lossless data compression algorithm by constructing prefix codes based on a set of characters and their probabilities (calculated or measured). It is suboptimal in the sense that it does not allow achieving the minimum possible code lengths as in Huffman coding , and it will never be better, but sometimes equal to the Shannon-Fano code.
This method was the first of its kind, this technique was used to prove Shannon's theorem on noise-immune coding in 1948 in his article “Mathematical Communication Theory” [1] .
In Shannon coding, characters are arranged in order from most likely to least likely. They are assigned codes, by taking the first digits from binary decomposition of cumulative probability . Here denotes a ceiling function that rounds to the nearest integer greater than or equal to .
Example
This table provides an example of Shannon coding. You can immediately notice from the result codes that it is less optimal than the Fano-Shannon method .
The first step is to calculate the probabilities of each character. Then count the number for every probability. For example, for a 2 it is equal to three ( - the minimum degree of two −3, therefore equals three). After that, the sum of probabilities from 0 to i-1 is considered and converted to binary form. Then the fractional part is truncated from the left to number of characters.
| a i | p (a i ) | l i | Amount p i to i-1 | Sum by p (a i ) bin | Final code |
|---|---|---|---|---|---|
| a 1 | 0.36 | 2 | 0.0 | 0.0000 | 00 |
| a 2 | 0.18 | 3 | 0.36 | 0.0101 | 010 |
| a 3 | 0.18 | 3 | 0.54 | 0.1000 | 100 |
| a 4 | 0.12 | four | 0.72 | 0.1011 | 1011 |
| a 5 | 0.09 | four | 0.84 | 0.1101 | 1101 |
| a 6 | 0.07 | four | 0.93 | 0.1110 | 1110 |
Links
- "A Mathematical Theory of Communication" http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf