Segmentation Tutorial Introduction
The Web tutorial is
example-driven and targets the user who wants to rebalance the tradeoff to
avoid dropping any characters in the text. It should be mentioned that
modifications that improve the appearance of the text may have negative impact
on the compression ratio, the time needed to segment one image or the visual
qualities of photographs that may be in the document. The Document Express
segmenter was designed as a tradeoff between all these conflicting goals, but
the user has the choice to favor one of them.
The parameters have a very specific meaning within the model the Document Express segmenter tries to optimize. However, a precise understanding of this meaning may not be so helpful, as each parameter interacts with the others in often complex and counter-intuitive ways. This is why teaching by example works best here. The most complex type of interaction happens between the segmentation and inversion (when the foreground and background should be switched) processes. As a matter of fact, we found this interaction so complex that it was impossible to summarize in a few examples.
The examples we give do not involve any inversion. If all the foreground characters and drawings are darker than the background, no call to the inversion algorithm is necessary, and we recommend you turn it off by setting the inversion level to 0. NOTE: When inversion is necessary, the meaning of one parameter (threshold-level) is also “inverted”.