ROLZ (from the English R educed Offset LZ - Lempel-Ziv algorithm with reduced offsets) is a dictionary data compression algorithm similar to LZ77 , but using some contextual methods to reduce the number of active offsets. The very concept of ROLZ was first introduced by Malcolm Taylor in its RK archiver in 1999, and this algorithm is one of the most modern approaches to building fast efficient compression algorithms.
In the standard LZ77, matches are encoded by a pair:
- match length
- offset ( Eng. offset ) or distance ( Eng. distance )
The problem with this scheme is that there is redundancy in coding. For example, with a dictionary size of 4 KB, there are 4096 offset options. It is clear that most offsets will not be used, and if out of 4096 options, for example, only 512 is used, then 9 bits are enough for encoding the offset instead of 12 (a 25% reduction).
Content
Algorithm Options
Many authors have attempted to reduce the possible values of the displacements, among them we can note:
LZFG-C2
Authors: Edward R. Fiala, Daniel H. Greene, 1989.
Matches are not encoded by a pair [length, offset], but by a special index corresponding to a specific line in the dictionary. In other words, the same phrases have the same index, due to which economical coding of matches is provided.
LZRW4
Posted by Ross Williams, 1991
In fact, LZRW4 is similar to ROLZ. Although the author did not undertake the creation of a full-fledged program, in the above demonstration compressor you can see the ROLZ scheme in its draft version.
LZP1-LZP4
Posted by Charles Bloom, 1995.
LZP is a dictionary compression algorithm that, when encoding a match, dispenses with offsets altogether. This is possible due to the fact that the displacements relative to the current context are stored in a special table and the compressor and decompressor operate in the same way on this table.
LZ77-PM
Authors: Dzung T. Hoang, Philip M. Long, Jeffrey Scott Vitter, 1995.
This algorithm is similar to ROLZ, with the difference that variable length contexts are used instead of fixed-order contexts to reduce active offsets.
ROLZ
According to the author’s description, this algorithm is a fast Lempel-Ziv algorithm with a large dictionary, which is able to cover large distances in the dictionary simultaneously quickly and efficiently.
It should be noted that RK is a commercial archiver with closed source code and many details of the algorithms used have not been disclosed. But thanks to individuals, the secret was ajar and even several free programs were written on this compression algorithm.
So, to reduce active offsets, a fixed-order context is used. In the original, this is a first-order context (i.e., one character preceding the current character), it is also possible to use other contexts - say, a second-order context (two characters preceding the current one), etc.
Instead of looking for matches for all offsets in the dictionary, we restrict ourselves to searching only from those offsets in front of which the current context was present. In the simplest case, you can use a certain offset table:
// find the longest match
context = buf [ position - 1 ]; // current first order context
max_match = 0 ; // match length for encoding
index = 0 ; // match index
for ( i = 0 ; i < TAB_SIZE ; i ++ ) { // for each stored offset in the table for a given context
offset = tab [ context ] [ i ]; // find the offset
match_length = get_match ( offset ); // find the length of the match
if ( match_length > max_match ) { // found a longer match
max_match = match_length ;
index = i ; // save the index
}
}
// update the table
for ( i = TAB_SIZE - 1 ; i > 0 ; i - ) // first move all offsets up, deleting the oldest
tab [ context ] [ i ] = tab [ context ] [ i - 1 ];
tab [ context ] [ 0 ] = position ; // then add the current offset
// encode a literal or match
if ( max_match > = MIN_MATCH ) {
encode_match_length ( max_match ); // encode the length of the match
encode_match_index ( index ); // encode the match index
position + = max_match ;
}
else { // no matching length found
encode_literal ( buf [ position ]); // encode the literal
position ++ ;
}
This is the easiest way. In practice, iterating over, say, 1024 offsets at each step can take too much time. To speed up the search, hashing and various structures for quick searching can be used, which are used in widespread implementations of the LZ77 family of algorithms.
In the original ROLZ, literals are encoded using a first-order context model, and this is no accident. The fact is that this scheme encodes a larger number of literals, when compared with the standard LZ77, since very short matches will simply not be touched by the ROLZ scheme. For example, when using a first-order context and with a minimum match length of three characters, the current minimum match length will be four (1 (context) + 3 (minimum match length) = 4). A first-order context model or a more complex PPM 1-0 model, when encoding literals, can compensate for this shortcoming of the algorithm.
ROLZ2-ROLZ3
Posted by Malcolm Taylor 2005
These algorithms represent a further development of ROLZ:
- ROLZ2 - was designed to provide maximum unpacking speed.
- ROLZ3 - for maximum compression, with a slight loss in decompression speed.
Both algorithms use a first-order context to reduce active offsets, and they are also able to use a table of up to 32,000 offsets for each context.
- ROLZ2 uses a simple and fast first-order model to encode literals.
- ROLZ3 uses a more sophisticated CM (Context Mixing) second-order model.
But the main distinguishing feature of these algorithms is the use of optimal parsing during coding. Dynamic Programming - the technique used here is able to calculate the optimal sequence of literals / matches for many steps forward, given the choice when choosing the real cost of encoding a literal or a match.
See also
- Arithmetic Compression
- Lzw
Links
- RK and WinRK Homepage
- Compressor BALZ (Public Domain)
- QUAD Compressor (LGPL)
- Description and source codes LZRW1-LZRW4 - theoretical justification of the advantages of ROLZ
- Description and source codes of various LZP and LZCB options
- Arturo Campos. Description of the LZP Algorithm