po4a-gettextize

Convert document formats to gettext PO files

TLDR

Convert a text file to PO file

$ po4a-gettextize --format [text] --master [path/to/master.txt] --po [path/to/result.po]

List all available formats

$ po4a-gettextize --help-format

Convert a text file along with a translated document to a PO file (-l option can be provided multiple times)

$ po4a-gettextize --format [text] --master [path/to/master.txt] --localized [path/to/translated.txt] --po [path/to/result.po]

SYNOPSIS

po4a-gettextize [options] <input_file> <output_pot_file>
po4a-gettextize [options] --master-language <lang> <input_file> <output_pot_file>

--master-language <lang>
    Specifies the language of the source document, helping the parser correctly identify translatable content.

--format <fmt>
    Explicitly sets the format of the input document (e.g., 'man', 'docbook', 'sgml', 'tex', 'txt'). If not specified, po4a attempts to guess.

--option <fmt:opt[=val]>
    Passes a specific option to the parser for the specified format. Useful for fine-tuning parsing behavior.

--add-references <type>
    Adds references (e.g., line numbers, paragraph numbers) to the PO file to aid translators in context.

--keep
    Retains temporary files generated during the process, useful for debugging.

--verbose, --debug
    Increases the verbosity or debug level of the output, providing more information about the extraction process.

--localized-files <files>
    Specifies a comma-separated list of already localized files. Less common for gettextize directly, more for updating PO files.

--srcdir <dir>, --destdir <dir>
    Specify source and destination directories. Primarily used when processing multiple files or in conjunction with po4a build systems.

--version
    Displays the version information of po4a-gettextize.

--help
    Shows a concise help message with available options.

DESCRIPTION

po4a-gettextize is a utility from the po4a (PO for Anything) project designed to extract translatable strings from various documentation formats and store them in Gettext Portable Object Template (.pot) files. This command is the crucial first step in localizing documentation using the po4a toolchain. It acts as a parser that understands different input formats, such as man pages, DocBook, SGML, TeX, LaTeX, or even simple text files. It scans the input document, identifies the parts marked as translatable content (e.g., text, section titles, paragraphs), and then generates a .pot file. This .pot file serves as a template that can be distributed to translators. Translators then create .po files containing the translations for specific languages. The command ensures that the original document structure is preserved, allowing for the re-integration of translated strings without breaking the document's layout. It simplifies the localization workflow by automating the extraction process, making it easier to manage and update translations for complex documentation sets.

CAVEATS

While po4a-gettextize is versatile, its effectiveness heavily relies on correct format detection or explicit specification via --format. Incorrect format specification can lead to incomplete or erroneous string extraction. Some complex or malformed documents might not be parsed perfectly, requiring manual adjustments or specific format options. The quality of the generated .pot file directly impacts the ease of translation.

WORKFLOW INTEGRATION

po4a-gettextize is the first step in the typical po4a translation workflow. After generating the .pot file, translators translate it into language-specific .po files. These .po files are then used by po4a-translate to generate the localized versions of the original document. po4a-updatepo is used to update existing .po files when the original document changes, ensuring efficient maintenance of translations.

SUPPORTED FORMATS

The command supports a growing list of document formats, including but not limited to: man pages, DocBook, SGML, TeX, LaTeX, ASCII text, pod (Plain Old Documentation), XML, and various Debian-specific formats like debian/changelog. The accuracy of extraction depends on the quality of the format parser (often called a 'module' within po4a) for that specific format.

HISTORY

The po4a project, of which po4a-gettextize is a core component, was initiated to bridge the gap between documentation formats and the Gettext translation system. Prior to po4a, localizing documentation often involved manual text extraction or complex, format-specific scripting. po4a-gettextize automates this extraction, enabling a consistent and scalable workflow for translating a wide array of document types, including man pages, which were notoriously difficult to localize systematically.