LinuxCommandLibrary

xidel

HTML/XML/JSON data extraction tool

TLDR

Extract with XPath
$ xidel [file.html] -e "//title"
copy
Extract from URL
$ xidel [https://example.com] -e "//h1"
copy
CSS selector
$ xidel [file.html] --css "div.content"
copy
Extract JSON
$ xidel [file.json] -e "$json/key"
copy
Multiple extractions
$ xidel [file.html] -e "//title" -e "//h1"
copy
Output JSON
$ xidel [file.html] -e "//a/@href" --output-format=json
copy
Follow links
$ xidel [url] -f "//a/@href" -e "//title"
copy

SYNOPSIS

xidel [-e expression] [--css selector] [options] input

DESCRIPTION

xidel is a command-line tool for extracting and querying data from HTML, XML, and JSON documents. It supports multiple query languages including XPath, XQuery, and CSS selectors, making it versatile for a wide range of data extraction tasks from both local files and remote URLs.
XPath and XQuery expressions allow precise navigation of document structure, while CSS selectors provide a familiar syntax for those accustomed to web development. For JSON documents, xidel uses a path-based syntax to navigate object hierarchies. Multiple extraction expressions can be combined in a single invocation for complex data gathering.
The tool includes a link-following mode that enables web spidering, where xidel can traverse links on pages and apply extraction expressions to each visited page. Output can be formatted as plain text, JSON, or other structured formats, making it suitable for integration into data processing pipelines.

PARAMETERS

-e, --extract EXPR

XPath/XQuery expression.
--css SELECTOR
CSS selector.
-f, --follow EXPR
Follow links.
--output-format FORMAT
Output format.
--input-format FORMAT
Input format.
-s, --silent
Suppress status.
--user-agent UA
User agent.

CAVEATS

Complex syntax learning curve. Large documents may be slow. Encoding issues possible.

HISTORY

xidel was created as a powerful command-line data extraction tool. It combines multiple query languages in one utility.

SEE ALSO

xmllint(1), jq(1), pup(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard