roff2x
Convert roff/man pages to other formats
SYNOPSIS
roff2x [OPTIONS] [FILE...]
PARAMETERS
-format <format>
Specifies the desired output format. Common values include docbook, linuxdoc, and html. If omitted, a generic XML output is produced.
-output <file>
Writes the converted output to the specified file instead of standard output (stdout).
-tag <tag>
Defines the top-level tag name for generic XML/SGML output formats.
-noname
Prevents the emission of the man tag in the output, relevant for specific structural requirements.
-novalid
Suppresses the emission of DOCTYPE or PUBLIC declarations, useful when external validation is not desired or handled by other means.
-nohead
Omits the HEAD element from the generated output.
-nofoot
Omits the FOOT element from the generated output.
-nofrag
Disables the emission of fragment markers, often represented as <br> tags, which can break output into smaller chunks.
-nohtml
Suppresses the generation of HTML-specific tags when producing generic output, ensuring purer XML/SGML.
-warn
Enables verbose warning messages during conversion, aiding in debugging and identifying potential issues.
-debug
Activates debug output, providing detailed information about the conversion process for advanced troubleshooting.
FILE...
One or more input groff source files to be converted. If no files are specified, roff2x reads from standard input (stdin).
DESCRIPTION
roff2x is a command-line utility designed to convert documents written in the groff (GNU roff) typesetting system into various XML or SGML output formats. It acts as a post-processor or converter for groff source files, commonly used for man pages, technical documentation, and other structured text.
Its primary purpose is to facilitate the migration of legacy roff-formatted documentation into more modern, structured markup languages like DocBook XML or LinuxDoc SGML. This enables easier processing by other tools, integration into larger documentation frameworks, or publishing on the web.
While it can produce HTML output, groff -Thtml is often preferred for more direct HTML generation. roff2x seamlessly integrates with groff's preprocessors, handling tables (gtbl), equations (geqn), and pictures (gpic) embedded within the source document.
CAVEATS
roff2x is part of the groff package and may not be pre-installed on all minimal Linux distributions. The quality and structure of the output heavily depend on the consistency and specific macros used in the input roff file. For straightforward HTML conversion of man pages, using groff -Thtml is often more direct and can yield better results.
SUPPORTED OUTPUT FORMATS
Beyond generic XML/SGML, roff2x explicitly supports specific document type definitions (DTDs) like DocBook XML and LinuxDoc SGML. The choice of format significantly influences the structure and semantic tags in the generated output.
INPUT PREPROCESSOR INTEGRATION
roff2x handles input files that utilize groff preprocessors. This means it can correctly process documents containing tables (processed by gtbl), mathematical equations (processed by geqn), and embedded graphics (processed by gpic), converting their formatted output into the target XML/SGML structure.
HISTORY
roff2x is an integral part of the GNU groff project, which serves as a modern re-implementation of the classic troff typesetting system. Its development emerged from the growing need to bridge traditional roff-formatted documentation with contemporary structured markup standards. It reflects the ongoing efforts to enable seamless data interchange and leverage groff's powerful text processing capabilities within broader digital documentation ecosystems. Its creation underscores the evolution from purely print-oriented formatting to versatile, machine-readable document formats.