Adaptive Huffman coding (also called dynamic Huffman coding ) is an adaptive method based on Huffman coding . It allows you to build a code scheme in streaming mode (without preliminary data scanning), without having any initial knowledge from the initial distribution, which allows you to compress the data in one pass. The advantage of this method is the ability to code on the fly.
Content
Algorithms
There are several implementations of this method, the most notable are FGK (FGK: Foller, Gallager and Knut ) and the Witter algorithm.
FGK Algorithm
It allows you to dynamically adjust the Huffman tree without initial frequencies. There is a special external node in the Huffman FGC tree, called the 0-node , used to identify incoming characters. That is, whenever a new symbol is encountered, its path in the tree starts from the zero node. The most important thing is that you need to truncate and balance the Huffman FGK tree if necessary, and update the frequency of the connected nodes. As soon as the frequency of the symbol increases, the frequency of all its parents should also be increased. This is achieved by sequentially rearranging nodes, subtrees, or both.
An important feature of the FGK tree is the principle of brotherhood (or rivalry): each node has two descendants (nodes without descendants are called sheets) and the weights are descending. Due to this property, the tree can be stored in a regular array, which increases productivity. [1] [2]
Witter Algorithm
The code is presented in the form of a tree structure in which each node has a corresponding weight and a unique number.
The numbers go down and from right to left.
Weights must satisfy the principle of brotherhood. Thus, if A is the parent node of B and C is a descendant of B, then .
Weight is just the number of characters seen before.
A set of nodes with the same weights are a block .
To get the code for each node, in the case of a binary tree, we could just go all the way from the root to the node, writing, for example, “1” if we go right, and “0” if we go left.
Also, this algorithm uses a special sheet (a node without descendants), NYT (from English not yet transmitted - a symbol not yet transmitted) from which new symbols that have never been seen before “grow”.
The encoder and decoder start only with the root node, which has the maximum weight. In the beginning, this is our NYT node.
When we pass the NYT character, we must first pass the code of the node itself, and then the data.
For each character that is already in the tree, we should only transmit the code of the end nodes (sheets).
For each transmitted symbol, the transmitter and receiver perform the update procedure:
- If the current character is not encountered, add two child nodes to NYT: one for the next NYT, the other for the character. Increase the weight of the new sheet and the old NYT and go to step 4. If the current character is not NYT, go to the character sheet.
- If this node does not have the largest weight in the block, swap it with the node with the largest number, except if this node is the parent element [3]
- Weight increase for current node
- If this is not the root node, go to the parent node then go to step 2. If this is the root, end.
Note: replacing nodes means replacing weights and associated symbols, but not numbers.
Example
We start with an empty tree.
For "a" we pass its binary code.
NYT spawns two child nodes: 254 and 255. Increase the weight of the root. Code “a” associated with node 255 becomes 1.
For “b”, transmit 0 (node NYT code), then its binary code.
NYT spawns two child nodes: 252 for NYT and 253 for b . We increase the weight of 253, 254 and the root. The code for "b" is 01.
For the next b, 01 is passed.
We go to sheet 253. We have a weight block of 1 and the largest number in block 255, so we change the weights and symbols of nodes 253 and 255, increase the weight, go to the root and increase the weight of the root.
In the future, the code “b” is 1, and for “a” it is now 01, which reflects their frequency.
Notes
- ↑ [1] , FGK algorithm
- ↑ [2] , adaptive Huffman coding
- ↑ Adaptive Huffman Coding . Cs.duke.edu. Date of treatment February 26, 2012.
Literature
- Vitter's original paper: JS Vitter, “ Designing and Analyzing Dynamic Huffman Codes, ” ACM Journal, 34 (4), October 1987, 825-845.
- JS Vitter, “ALGORITHM 673 Dynamic Huffman Coding”, ACM Transactions on Mathematical Software, 15 (2), June 1989, pp 158–167. Also appears in Collected Algorithms of ACM.
- Donald E. Knuth, “Dynamic Huffman Coding”, Journal of Algorithm, 6 (2), 1985, pp 163-180.
Links
- Paul E. Black. Adaptive Huffman coding . Dictionary of algorithms and data structures . NIST .