DjVu (from French déjà vu - “already seen”) is a lossy image compression technology developed specifically for storing scanned documents - books, magazines, manuscripts and so on, where the abundance of formulas, diagrams, drawings and handwritten characters makes it extremely time-consuming to fully use them recognition . It is also an effective solution if it is necessary to convey all the nuances of design, for example, historical documents, where not only the content, but also the color and texture of the paper is important; parchment defects: cracks, traces of folding; corrections, blots, fingerprints; traces left by other objects, etc.
| Djvu | |
|---|---|
| Expansion | or |
| MIME type | image/vnd.djvuimage/x-djvu |
| Signature | AT & TFORM |
| Developer | AT&T Labs-Research, LizardTech (since March 2000 [1] ) |
| Published | 1998 |
| Latest issue | 27 (July 2006 [2] ) |
| Format type | Electronic document |
| Open format ? | Yes |
| Site | |
The technology was originally developed by Jan Lekun , Leon Bothu and Patrick Heffner at AT&T Labs from 1996 to 2001. DjVu has become the basis for several libraries of scientific books. It is quite popular, and a large number of different documents are made in it.
The format is optimized for transmission over the network so that the page can be viewed even before the file is downloaded. DjVu-file can contain a text ( OCR ) layer, which allows full-text search in the file. In addition, a DjVu file may contain a built-in interactive table of contents and active areas - links, which allows for convenient navigation in DjVu books.
Content
Technologies Used
DjVu uses special technology to compress color images, dividing the original image into three layers: foreground, background and black and white (single-bit) mask. The mask is saved with the resolution of the source file; it contains the image of the text and other clear details. The background resolution, which retains the illustrations and page texture, is reduced by default to save space. The foreground contains color information about the mask; its resolution usually decreases even more. Then the foreground and foreground are compressed using the wavelet transform , and the mask is compressed using the JB2 algorithm.
A feature of the JB2 algorithm is that it searches for duplicate characters on the page and saves their image only once. In multi-page documents, every few consecutive pages use a common "dictionary" of images.
To compress most books, you can do only two colors. In this case, only one layer is used, which allows to achieve a record compression ratio. In a typical book with black and white illustrations, scanned with a resolution of 600 dpi , the average page size is about 15 Kb , that is, approximately 100 times smaller than the original file. In the presence of a complex background, the volume gain is usually 4-10 times. However, standard settings in DjVu use lossy data compression , therefore lossless compression formats are usually used for critical documents: PNG , JPEG 2000 , TIFF , etc. In DjVu, you can also use lossless data compression . For example, the CJB2 utility from the DjVuLibre package provides lossless compression.
The DjVu format is based on several technologies, including those developed by AT&T Labs:
- an algorithm for separating text from background on a scanned image;
- wavelet compression algorithm background IW44 ;
- JB2 black-and-white image compression algorithm (similar to JBIG2 );
- universal compression algorithm Deflate ;
- on-demand unpacking algorithm;
- algorithm for masking images.
DjVu Image Text Representation
The DjVu format provides for a text layer that contains text from the page. (Used to search and easily copy text when working with it). If the text layer is not available, the only way to get the text is to perform optical recognition in third-party programs.
Licensed Information
In March 2000 [1] AT&T sold the technology to LizardTech [3] , which tried to use it in their commercial interests, but due to the openness of the format [4], free software is available for various platforms to create and view DjVu documents. On July 1, 2008, LizardTech transferred the management of DjVu to the parent company Celartem [5] , and on June 10, 2009 Celartem transferred the rights to DjVu technology to Cuminas (formerly Caminova [6] ) [7] .
There is an open source DjVuLibre library published under the GNU GPL and encoders and viewers on it.
See also
- Digitization of books
Notes
- ↑ 1 2 See the readme file of the DjVuLibre package -3.5.27.tar.gz
- ↑ DjVu File Format Versions . DjVu Developers (July 2006). Date of treatment January 16, 2010.
- ↑ DjVuLiber History and Credits
- ↑ License
- ↑ LizardTech Press Release
- ↑ Caminova - Cuminas transition (inaccessible link) . Date of treatment October 25, 2014. Archived on October 5, 2014.
- ↑ LizardTech Press Release
Links
- DJVU is a format for electronic libraries. Algorithms and advantages, programs and use, user manuals.
- Deja Vu or DjVu: educational program. Collection of articles about DjVu technology.
- DjVu.org English-language portal of the DjVu community.