Segmentation process

Segmentation Process

Segmenting a scanned image (encoded as a ”pixel map”) into a list of characters or “foreground shapes” will never be a perfect process, as it is by essence subjective and dependent on human understanding of the document.

Ideally, we would like to segment only foreground shapes that carry some kind of semantic meaning, such as text and drawings, while leaving out what is supposed to be an image. For a Westerner, a scroll of calligraphic Chinese characters should probably not be segmented, as it is a painting, not a text with an understandable meaning.

The Document Express segmenter is far from this ideal: it does not have any capability to perform character recognition, syntactic and semantic analysis. For instance, it does not perform any OCR (optical character recognition) to check if a shape is a letter that happens to be part of a word. The reason is that even something as simple as OCR would be too computationally expensive to achieve a typical segmentation in less than a second.

The Document Express segmenter relies on a model of the document with two layers (foreground and background) and encoding schemes for each of the layers. These encoding schemes try to “compress” the document in the most efficient way, with a loss of quality not easily visible to the human eye. The quality of an encoding scheme is given by a coding cost, that is, a number of bits. Suppose you want to send the document: you start by counting how many bits are required to send the compressed encoded document. As the “lossy” compression of the document leads to a loss in visual quality, and sometimes to errors that are visible to the eye, you must also send corrections, which are going to cost you more bits to encode.

Document Express finds the segmentation that minimizes this total coding cost: a tradeoff between high compression and some errors.

The current segmenter is optimized for a high compression rate to make it possible to download 300dpi scanned pages through a low-bandwidth 56K connection. In the process, some very small or thin characters may be dropped.

Fortunately, it is possible to rebalance this tradeoff, and we describe which parameters of the segmenter have the strongest impact on the foreground-background segmentation (for more on how to optimize segmentation, see Segmenter Reference Guide and Tutorial).

Segmentation Layers

During the encoding process, Document Express segments color images into multiple layers. Each layer is then encoded using a method that is most appropriate for the elements contained in that layer. Because segmentation allows several encoding methods to be implemented in a single color image, the size of the output DjVu^® file is very small.

Each color image is segmented into the following layers:

Layer	Description
Mask	Contains black and white text and line drawings, which have sharp edges and uniform colors. To maintain their high-contrast appearance and readability, these elements are encoded at full resolution (typically 300 dpi), using the JB2 data compression method.
Background	Contains color or grayscale photographs, pictures, background textures, and other continuous-tone images. Because readability and contrast are less emphasized, these components are encoded at one-third the resolution (typically 100 dpi) using the IW44 wavelet-based compression method.
Foreground	Contains the color of text and line drawings. Because it consists of identically colored pixels (and, therefore, little variance), this model is typically encoded at a low resolution using the IW44 wavelet-based compression method. NOTE: The foreground layer of two-layer color DjVu documents is encoded using the JB2 data compression method. See below.

"Color JB2" DjVu Documents

An alternative to the three-layer color DjVu document is the color-jb2 format in which the mask and foreground are combined into one layer. The foreground in the two-layer format is encoded by specifying one (palletized) solid color for each object described by the JB2-encoded mask. Such DjVu documents are rendered by painting each foreground color on top of the background color image using the appropriate solid color. You must specify the two-layer format on the command line using the jb2-format parameter of the documenttodjvu command.