Clever Geek Handbook
📜 ⬆️ ⬇️

Shannon's algorithm

In the area of data compression , the Shannon code , named after its creator, Claude Shannon , is a lossless data compression algorithm by constructing prefix codes based on a set of characters and their probabilities (calculated or measured). It is suboptimal in the sense that it does not allow achieving the minimum possible code lengths as in Huffman coding , and it will never be better, but sometimes equal to the Shannon-Fano code.

This method was the first of its kind, this technique was used to prove Shannon's theorem on noise-immune coding in 1948 in his article “Mathematical Communication Theory” [1] .

In Shannon coding, characters are arranged in order from most likely to least likely. They are assigned codes, by taking the firstli=⌈-logpi⌉ {\ displaystyle l_ {i} = \ left \ lceil {- \ log} p_ {i} \ right \ rceil} {\ displaystyle l_ {i} = \ left \ lceil {- \ log} p_ {i} \ right \ rceil} digits from binary decomposition of cumulative probabilityΣk=onei-onepk {\ displaystyle \ sum \ limits _ {k = 1} ^ {i-1} p_ {k}} {\ displaystyle \ sum \ limits _ {k = 1} ^ {i-1} p_ {k}} . Here⌈x⌉ {\ displaystyle \ left \ lceil x \ right \ rceil} {\ displaystyle \ left \ lceil x \ right \ rceil} denotes a ceiling function that roundsx {\ displaystyle x} x to the nearest integer greater than or equal tox {\ displaystyle x} x .

Example

This table provides an example of Shannon coding. You can immediately notice from the result codes that it is less optimal than the Fano-Shannon method .

The first step is to calculate the probabilities of each character. Then count the numberl {\ displaystyle l}   for every probability. For example, for a 2 it is equal to three (2-3≤0,18≤2-2 {\ displaystyle 2 ^ {- 3} \ leq 0.18 \ leq 2 ^ {- 2}}   - the minimum degree of two −3, thereforel {\ displaystyle l}   equals three). After that, the sum of probabilities from 0 to i-1 is considered and converted to binary form. Then the fractional part is truncated from the left toli {\ displaystyle l_ {i}}   number of characters.

a ip (a i )l iAmount p i to i-1Sum by p (a i ) binFinal
code
a 10.3620.00.000000
a 20.1830.360.0101010
a 30.1830.540.1000100
a 40.12four0.720.10111011
a 50.09four0.840.11011101
a 60.07four0.930.11101110

Links

  1. "A Mathematical Theory of Communication" http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf


Source - https://ru.wikipedia.org/w/index.php?title=Sheannon_Algorithm&oldid=90222121


More articles:

  • List of Courtesy Titles in the Peck of Britain and Ireland
  • Basmore, Kent
  • Scott, Mike (basketball player)
  • Safe Mode
  • Orthodoxy in Venezuela
  • Interegey Sumon
  • Italian Super Cup 2010
  • List of fish and cyclostomes listed in the Red Book of the Orenburg Region
  • Kutsmus, Andrei Aleksandrovich
  • Pristomerus bemba

All articles

Clever Geek | 2019