Incremental coding

Incremental encoding , also known as front compression or rear compression , is a type of delta encoding where common prefixes or suffixes and their lengths are written in such a way as to avoid duplication of data. This algorithm is well suited for compressing sorted data , such as a list of words in a dictionary .

For example:

Input data

General prefix

Compressed output

  myxa
 myxophyta
 myxopod
 nab
 nabbed
 nabbing
 nabit
 nabk
 nabob
 nacarat
 nacelle

  start of data
 'myx'
 'myxop'
 no common prefix
 'nab'
 'nabb'
 'nab'
 'nab'
 'nab'
 'na'
 'nac'

  0 myxa
 3 ophyta
 5 od
 0 nab
 3 bed
 4 ing
 3 it
 3 k
 3 ob
 2 carat
 3 elle

64 bytes

46 bytes

This method was used as the base for the GNU locate utility in the index of file and directory names. Also, delta encoding is used for common prefix lengths. This means an additional step in which instead of the length of the common prefix, a change in the length of the common prefix is used.

Despite its simplicity, incremental encoding can save a lot of memory, especially when used in front of other archivers such as gzip or bzip2 .

Incremental coding

More articles: