xgettext

Extract translatable strings from source code

TLDR

Scan file and output strings to messages.po

$ xgettext [path/to/input_file]

Use a different output filename

$ xgettext [[-o|--output]] [path/to/output_file] [path/to/input_file]

Append new strings to an existing file

$ xgettext [[-j|--join-existing]] [[-o|--output]] [path/to/output_file] [path/to/input_file]

Don't add a header containing metadata to the output file

$ xgettext --omit-header [path/to/input_file]

Display help

$ xgettext [[-h|--help]]

-o FILE, --output=FILE
    Writes the generated Portable Object Template (POT) file to the specified FILE instead of the default messages.pot.

-d NAME, --default-domain=NAME
    Sets the default domain (basename of the output file) to NAME. For example, -d myapp would create myapp.pot.

-L NAME, --language=NAME
    Specifies the programming language of the input files. NAME can be 'C', 'C++', 'Java', 'Python', 'Perl', 'PHP', etc. This helps xgettext understand syntax and string literal definitions.

-k [WORD], --keyword[=WORD]
    Specifies additional keywords (function names) to look for, besides the default gettext related functions. If WORD is omitted, all default keywords are disabled.

-c [TAG], --add-comments[=TAG]
    Extracts comments from the source code and places them in the POT file. If TAG is provided (e.g., 'TRANSLATORS:'), only comments starting with that tag are extracted.

-j, --join-existing
    Joins the output with an existing POT file. New messages are added, obsolete messages are marked, and existing ones are preserved.

-s, --sort-output
    Sorts the entries in the output POT file by their message ID (msgid). This helps in managing translations and can improve diff readability.

-h, --help
    Displays a help message with available options and exits.

-v, --version
    Prints version information about xgettext and exits.

DESCRIPTION

xgettext is a fundamental utility within the GNU Gettext internationalization (i18n) framework. Its primary purpose is to scan source code files (supporting a wide range of programming languages like C, C++, Java, Python, Perl, PHP, etc.) for marked translatable strings. It identifies strings that are wrapped in specific function calls (e.g., gettext(), _(), N_()) or other patterns indicating they need translation.

Upon scanning, xgettext extracts these strings, along with any associated translator comments, and compiles them into a Portable Object Template (POT) file, typically named messages.pot. This POT file acts as a master template for all translatable messages in a project. It serves as the basis from which language-specific Portable Object (PO) files are generated for translators to fill in. xgettext is an essential first step in the localization workflow, enabling developers to separate user-facing text from their code and streamline the translation process.

CAVEATS

xgettext relies on source code adhering to specific conventions for marking translatable strings. If strings are not properly wrapped in recognized functions (e.g., _() or gettext()), they will not be extracted.

While xgettext can parse many languages, its effectiveness depends on the quality of its language parsers and the consistency of the source code. It's not a translation tool itself; it only facilitates the extraction process.

Overly generic keyword definitions or unintended string literals in code can lead to the extraction of non-translatable text, increasing the burden on translators.

INTERNATIONALIZATION WORKFLOW ROLE

In a typical internationalization workflow, xgettext is the initial step. It generates the master template (.pot file). This template is then distributed to translators who use tools to create language-specific translation files (.po files). These .po files are compiled into binary .mo files (using msgfmt) that applications can load at runtime to display translated messages.

CUSTOM KEYWORDS AND CONTEXT

xgettext supports custom keywords via the -k option, allowing developers to use their own wrapper functions for translatable strings. For strings that might have the same content but different meanings depending on context (e.g., 'Open' as a verb vs. 'Open' as an adjective), xgettext can also extract 'context' information, enabling translators to provide distinct translations for identical strings in different contexts.

HISTORY

xgettext is a core component of the GNU Gettext internationalization system, which was developed to standardize and simplify the process of localizing software. The Gettext project began in the early 1990s, spearheaded by the Free Software Foundation. xgettext has continuously evolved alongside the Gettext library, adding support for new programming languages and improving its parsing capabilities and options, making it a robust and widely used tool for software internationalization. Its development is primarily attributed to Bruno Haible and other contributors to the GNU Gettext project.