The ocr option generates a hidden text layer using the third party OCR engine that ships with Document Express Enterprise.
Use the ocr option in conjunction with the djvujoin or djvubundle command to perform OCR on a single-page or multiple-page DjVu® document: For example:
djvujoin
--ocr singlepage.djvu newsinglepage.djvu
djvubundle
--ocr multipage.djvu newmultipage.djvu
char – Controls the lowest level of the text. By default, the text layer is grouped by words (as identified by the OCR engine). If this option is selected, the text layer is grouped by characters. This option is necessary for many languages, such as the CKJV languages.
nosep – By default, separators are inserted between syntactic elements (as in English). If this option is selected, no separators are inserted. The presence of separators affects text searches since separators (or lack of them) in the search pattern are required to match those in the text layer. Necessary for CJKV languages.
lang – Language(s) to be recognized. The default is English. This option can be written in two forms. To specify a single language:
lang=Japanese
If multiple languages are to be recognized, write the string in the following form:
lang=(Japanese,English)
Each language specified should be either the name of the language or its number from the list of #defines below.
mixed – Enables the mixed Asian-Latin reading mode (see the IRIS documentation).
NOTE: The --ocr parameter cannot have embedded spaces, even if the string is enclosed in quotation marks. This appears to be a Windows OS restriction.
If you want the IRIS OCR module to recognize a foreign language character set, you can specify the language string. The following are valid language strings:
#define
AMERICAN 1
#define
ENGLISH 1
#define
GERMAN 2
#define
FRENCH 3
#define
SPANISH 4
#define
ITALIAN 5
#define
BRITISH 6
#define
SWEDISH 7
#define
DANISH 8
#define
NORWEGIAN 9
#define
DUTCH 10
#define
PORTUGUESE 11
#define
BRAZILIAN 12
#define
GALICIAN 13
#define
ICELANDIC 15
#define
GREEK 17
#define
CZECH 18
#define
HUNGARIAN 19
#define
POLISH 20
#define
ROMANIAN 21
#define
SLOVAK 22
#define
CROATIAN 23
#define
SERBIAN 24
#define
SLOVENIAN 25
#define
LUXEMB 28
#define
FINNISH 29
#define
TURKISH 30
#define
LATIN 31
#define
RUSSIAN 32
#define
BYELORUSSIAN 33
#define
UKRAINIAN 34
#define
MACEDONIAN 35
#define
BULGARIAN 36
#define
JAPANESE 37
#define
ESTONIAN 38
#define
LITHUANIAN 39
#define
LATVIAN 40
#define
AFRIKAANS 41
#define
ALBANIAN 42
#define
CATALAN 43
#define
IRISH_GAELIC 44
#define
SCOTTISH_GAELIC 45
#define
BASQUE 46
#define
BRETON 47
#define
CORSE 48
#define
FRISIAN 49
#define
NYNORSK 50
#define
INDONESIAN 51
#define
MALAY 52
#define
SWAHILI 53
#define
TAGALOG 54
#define
KOREAN 55
#define
SCHINESE 56
#define
TCHINESE 57
#define
QUECHA 59
#define
AYMARA 60
#define
FAROESE 61
#define
FRIULIAN 62
#define
GREENLANDIC 63
#define
HAITIAN_CREOLE 65
#define
RHAETO_ROMAN 66
#define
SARDINIAN 67
#define
KURDISH 68
#define
CEBUANO 69
#define
BEMBA 105
#define
CHAMORRO 106
#define
FIJAN 108
#define
GANDA 109
#define
HANI 110
#define
IDO 111
#define
INTERLINGUA 112
#define
KICONGO 113
#define
KINYARWANDA 114
#define
MALAGASY 115
#define
MAORI 117
#define
MAYAN 118
#define
MINANGKABAU 119
#define
NAHUATL 120
#define
NYANJA 121
#define
RUNDI 123
#define
SAMOAN 124
#define
SHONA 125
#define
SOMALI 126
#define
SOTHO 127
#define
SUNDANESE 128
#define
TAHITIAN 129
#define
TONGA 130
#define
TSWANA 131
#define
WOLOF 133
#define
XHOSA 134
#define
ZAPOTECO 135
#define
JAVANESE 139
#define
PIDGIN_NIGERIA 142
#define
OCCITAN 143
#define
MANX 144
#define
TOK_PISIN 145
#define
BISLAMA 146
#define
HILIGAYNON 147
#define
KAPAMPANGAN 149
#define
BALINESE 150
#define
BIKOL 151
#define
ILOCANO 152
#define
MADURESE 153
#define
WARAY 154
#define
SERBIAN_LATIN 155
For example:
djvubundle
--ocr=lang=German mypages*.djvu mybundledfilewithocr.djvu
This instructs IRIS to recognize German characters. Several languages can be specified at one time:
djvubundle --ocr=lang=GermanFrench mypages*.djvu mybundledfilewithocr.djvu
Supporting commands:
djvubundle, djvujoin