gocr
Perform optical character recognition (OCR) on images
TLDR
Recognize characters in the [i]nput image and [o]utput it in the given file. Put the database ([p]) in path/to/db_directory (verify that the folder exists or DB usage will silently be skipped). [m]ode 130 means create + use + extend database
Recognize characters and assume all [C]haracters are numbers
Recognize characters with a cert[a]inty of 100% (characters have a higher chance to be treated as unknown)
SYNOPSIS
gocr [options] <image-file>
gocr [options] -i <image-file> [-o <output-file>]
PARAMETERS
-i <file>
Specifies the input image file. If omitted, the image file can be provided as a direct argument after the options.
-o <file>
Specifies the output text file for the OCR result. If omitted, the recognized text is printed to standard output (stdout).
-C <charset>
Defines a custom character set to recognize (e.g., '0123456789' for digits only). Characters not in this set will be ignored or marked as unknown.
-b <level>
Enables barcode detection with a specified level (0=off, 1=on, higher values for more strict detection). Experimental feature.
-c <char>
Sets the character to use for unrecognized characters (e.g., '?' or ' '). By default, gocr might use an underscore '_'.
-d <level>
Sets the debug level for internal diagnostics, from 0 (no debug) to higher values for more verbose output, useful for troubleshooting.
-e <value>
Enables experimental enhanced OCR features, with different values indicating specific enhancements or algorithms.
-f <format>
Specifies the output format (e.g., 'text' for plain text, 'html' for HTML, 'xml' for XML). Defaults to plain text.
-l <lang>
Sets the OCR language to use (e.g., 'en' for English, 'de' for German, 'fr' for French). Affects character set and dictionary lookups.
-m <mode>
Defines the OCR mode of operation (e.g., 1 for line mode, 2 for paragraph mode), influencing how text blocks are processed.
-p <path>
Specifies the path to font training files, if custom fonts are used or for fine-tuning recognition.
-r <degrees>
Rotates the input image by the specified degrees (e.g., 90, 180, 270) clockwise before performing OCR, useful for incorrectly oriented scans.
-s <lines>
Sets the number of scanlines per character for fine-tuning character recognition, impacting how gocr segments characters.
-v
Displays the version information of the gocr program.
-h
Displays a brief help message with available options and their usage.
DESCRIPTION
gocr (also known as JOCR) is a free and open-source optical character recognition (OCR) program designed to convert scanned images of text into plain text files. Initiated by Joerg Schulenburg in 1999, it aims to provide a lightweight solution for text extraction from images. It supports various image formats, including PNM, PGM, PPM, GIF, PNG, JPG, BMP, and TIFF, often leveraging external libraries like ImageMagick or NetPBM for robust format handling.
While not as sophisticated or accurate as some modern OCR engines like Tesseract, gocr is valued for its simplicity, minimal dependencies, and its ability to run efficiently on systems with limited resources. It is particularly useful for basic OCR tasks or integration into scripts where a straightforward command-line OCR tool is required. Its output can be directed to standard output or a specified file, making it flexible for piping and further text processing in automated workflows.
CAVEATS
Accuracy: gocr is generally less accurate than modern OCR engines like Tesseract, especially on complex layouts, noisy images, or non-standard fonts.
Image Quality: It performs best with relatively clean, high-contrast, and well-scanned images. Poor image quality significantly degrades recognition results.
Language Support: While it supports multiple languages, its recognition quality for non-English languages might vary and may not be as robust as other, more extensively trained tools.
Limited Features: Lacks advanced features such as comprehensive layout analysis, table recognition, or handwriting OCR, which are found in more sophisticated solutions.
INTEGRATION WITH IMAGE PROCESSING TOOLS
gocr often yields the best results when used in conjunction with image processing utilities like ImageMagick (e.g., convert) or NetPBM tools. These tools can be used to pre-process images by performing tasks such as deskewing, binarization, noise reduction, or contrast enhancement, which significantly improve gocr's OCR accuracy.
STANDARD INPUT/OUTPUT USAGE
A key strength of gocr for scripting is its ability to read image data from standard input (stdin) and write recognized text to standard output (stdout). This makes it highly suitable for piping within shell scripts, allowing seamless integration into complex text processing pipelines. For example, cat image.pnm | gocr -o - | less could process an image and display its text content.
HISTORY
gocr was initiated by Joerg Schulenburg in 1999 as one of the early attempts to create a free and open-source OCR solution for Linux and other Unix-like systems. Its development aimed to provide a lightweight command-line tool that could be easily integrated into scripts and workflows. Over the years, it has seen contributions from various developers, becoming a notable option for those seeking a simple OCR utility without heavy dependencies, particularly suitable for basic tasks and systems with limited resources.