nokogiri
Parse HTML and XML documents
TLDR
Parse the contents of a URL or file
Parse as a specific type
Load a specific initialization file before parsing
Parse using a specific encoding
Validate using a RELAX NG file
SYNOPSIS
nokogiri-diff [OPTIONS] FILE1 FILE2
PARAMETERS
--help
, -h
Displays a help message and exits.--version
, -v
Prints the version information and exits.--verbose
Increases verbosity of output, showing more details about the comparison process.--format <FORMAT>
Specifies the output format for differences (e.g., `text`, `xml`, `html`).--ignore-whitespace
Ignores differences in whitespace during comparison.--ignore-attributes <ATTR>
Ignores differences in specified attributes (e.g., `id`, `class`). Can be repeated for multiple attributes.--ignore-children
Ignores differences in child nodes, focusing only on the current node's content and attributes.
FILE1
The path to the first XML or HTML file for comparison.
FILE2
The path to the second XML or HTML file for comparison.
DESCRIPTION
The term "nokogiri" primarily refers to a powerful Ruby library for parsing and manipulating HTML and XML documents. While `nokogiri` itself is not a direct standalone Linux command in the traditional sense (like `ls` or `grep`), the `nokogiri` gem often installs a utility called `nokogiri-diff`.
This utility provides a command-line interface for comparing two XML or HTML files. It leverages the Nokogiri library's parsing capabilities to identify and present the structural and content differences between the two documents. It's particularly useful for development workflows, testing, or auditing changes in web content or data feeds.
Its presence as an executable demonstrates a practical command-line application of the underlying Nokogiri library's power beyond typical Ruby script usage.
CAVEATS
The `nokogiri-diff` command is a specific utility for document comparison, not a general-purpose XML/HTML parsing or manipulation tool from the command line. Its functionality is limited to identifying differences. For more complex operations like extracting data, modifying structures, or validating documents, direct use of the Nokogiri Ruby library within a Ruby script is necessary. Its availability depends on the `nokogiri` gem being installed on the system, and it might not be in the default system PATH depending on the Ruby installation method.
INSTALLATION
To use `nokogiri-diff`, you must first have Ruby installed, and then install the `nokogiri` gem. This is typically done via gem install nokogiri
or by including gem 'nokogiri'
in a Gemfile and running bundle install
. Ensure that necessary system libraries (like libxml2 and libxslt) are also present, as Nokogiri relies on them for its C extensions.
COMPARISON LOGIC
nokogiri-diff
performs a structural comparison, meaning it understands the tree-like nature of XML/HTML documents. It doesn't just compare text lines; it identifies changes in elements, attributes, and text content within the document's hierarchy, making it more intelligent for structured data than a simple line-by-line diff.
HISTORY
Nokogiri itself was first released in 2008, developed by Aaron Patterson and others. It rapidly became the de facto standard for XML and HTML parsing in the Ruby ecosystem due to its speed (being C-based, leveraging libxml2 and libxslt) and powerful API. The `nokogiri-diff` utility, while not as prominent as the library itself, has been included as a bundled executable script within the `nokogiri` gem for many years, providing a convenient command-line interface for a specific common task: comparing structured documents, directly leveraging the library's robust parsing capabilities.
SEE ALSO
diff(1): Compares two files line by line., xmllint(1): A command-line XML tool from libxml2, often used for parsing, validating, and formatting XML., tidy(1): A command-line tool for cleaning up and validating HTML., gem(1): The RubyGems package manager command, used for installing and managing Ruby gems like `nokogiri`., bundle(1): Bundler, a dependency manager for Ruby, often used to install gems including Nokogiri.