LinuxCommandLibrary
GitHubF-DroidGoogle Play Store

nokogiri

ruby HTML/XML parser CLI

TLDR

Parse HTML file
$ nokogiri [file.html]
copy
Fetch and parse URL
$ nokogiri [https://example.com]
copy
Parse with CSS selector
$ nokogiri [file.html] -e "[doc.css('h1').text]"
copy
Parse XML
$ nokogiri [file.xml] --type xml
copy
Drop into an IRB session with the document bound to `doc`
$ nokogiri [file.html]
copy
Validate against a RelaxNG schema
$ nokogiri [file.xml] --rng [schema.rng]
copy

SYNOPSIS

nokogiri [options] [fileorurl]

DESCRIPTION

nokogiri is the command-line front-end for the Nokogiri Ruby gem, a fast HTML/XML parser backed by libxml2 and libxslt. The CLI parses a file, URL, or stdin into a Nokogiri::HTML::Document or Nokogiri::XML::Document (bound as doc) and either drops you into an IRB session or runs the Ruby snippet supplied with -e so you can query it with CSS selectors (doc.css) or XPath (doc.xpath).

PARAMETERS

FILEORURL

HTML/XML file path or URL to parse. If absent, the document is read from stdin.
-e CODE
Execute Ruby CODE against the parsed document (which is bound to doc).
--type TYPE
Document type: xml or html. Defaults to autodetection by content type / extension.
-C FILE
Load a custom Ruby initialization file. Default: ~/.nokogirirc.
-E, --encoding ENCODING
Read input using the named character encoding (e.g. UTF-8, ISO-8859-1).
--rng URIORPATH
Validate the document against the given RelaxNG schema.
-v, --version
Show the Nokogiri version.
-?, --help
Display help.

CAVEATS

Requires Ruby and the nokogiri gem (`gem install nokogiri`). The -i interactive flag is not part of the modern CLI — running nokogiri file on a TTY drops into IRB by default; pass -e to run non-interactively. Fetching URLs uses open-uri, so HTTPS sites need OpenSSL support in the underlying Ruby build.

HISTORY

Nokogiri (Japanese for "saw") was created by Aaron Patterson and Mike Dalessio in 2008 as a faster, libxml2-backed alternative to Hpricot. It is one of the most-installed Ruby gems and ships a small CLI for ad-hoc parsing and validation.

SEE ALSO

xmllint(1), pup(1), xidel(1)

Copied to clipboard
Kai