xidel
HTML/XML/JSON data extraction tool
TLDR
SYNOPSIS
xidel [-e expression] [--css selector] [options] input
DESCRIPTION
xidel is a command-line tool for extracting and querying data from HTML, XML, and JSON documents. It supports multiple query languages including XPath, XQuery, and CSS selectors, making it versatile for a wide range of data extraction tasks from both local files and remote URLs.
XPath and XQuery expressions allow precise navigation of document structure, while CSS selectors provide a familiar syntax for those accustomed to web development. For JSON documents, xidel uses a path-based syntax to navigate object hierarchies. Multiple extraction expressions can be combined in a single invocation for complex data gathering.
The tool includes a link-following mode that enables web spidering, where xidel can traverse links on pages and apply extraction expressions to each visited page. Output can be formatted as plain text, JSON, or other structured formats, making it suitable for integration into data processing pipelines.
PARAMETERS
-e, --extract EXPR
XPath/XQuery expression.--css SELECTOR
CSS selector.-f, --follow EXPR
Follow links.--output-format FORMAT
Output format.--input-format FORMAT
Input format.-s, --silent
Suppress status.--user-agent UA
User agent.
CAVEATS
Complex syntax learning curve. Large documents may be slow. Encoding issues possible.
HISTORY
xidel was created as a powerful command-line data extraction tool. It combines multiple query languages in one utility.

