is a command line tool to download and extract data from HTML/XML pages.


Print all URLs found by a Google search

$ xidel [] --extract "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']"

Print the title of all pages found by a Google search and download them
$ xidel [] --follow "[//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']]" --extract [//title] --download ['{$host}/']

Follow all links on a page and print the titles, with XPath
$ xidel [] --follow [//a] --extract [//title]

Follow all links on a page and print the titles, with CSS selectors
$ xidel [] --follow "[css('a')]" --css [title]

Follow all links on a page and print the titles, with pattern matching
$ xidel [] --follow "[<a>{.}</a>*]" --extract "[<title>{.}</title>]"

Read the pattern from example.xml (which will also check if the element containing "ood" is there, and fail otherwise)
$ xidel [path/to/example.xml] --extract "[<x><foo>ood</foo><bar>{.}</bar></x>]"

Print all newest Stack Overflow questions with title and URL using pattern matching on their RSS feed
$ xidel [] --extract "[<entry><title>{title:=.}</title><link>{uri:=@href}</link></entry>+]"

Check for unread Reddit mail, Webscraping, combining CSS, XPath, JSONiq, and automatically form evaluation
$ xidel [] --follow "[form(css('form.login-form')[1], {'user': '$your_username', 'passwd': '$your_password'})]" --extract "[css('#mail')/@title]"


The trivial usage is to extract an expression from a webpage like:

xidel --extract //title

The next important option is --follow to follow links on a page. The following example will print the titles of all pages that are linked from

xidel --follow //a --extract //title


Xidel supports:

Extract expressions: there are few different kind of extract expressions

o CSS 3 Selectors : to extract simple elements
o XPath 3 : to extract values and calculate things with them
o XQuery 3 : to create new documents from the extracted values
o JSONiq : to work with JSON apis
o Templates : to extract several expressions in an easy way using a annotated version of the page for pattern-matching o Multipage templates: i.e. a file that contains templates for several pages
o XPath 2 / XQuery 1 : for legacy queries


o HTTP Codes : Redirections like 30x are automatically followed, while keeping things like cookies
o Links : It can follow all links on a page as well as some extracted values

Output formats:

o Adhoc : just prints the data in a human readable format
o XML : encodes the data in XML
o HTML : encodes the data in HTML
o JSON : encodes the data in a JSON
o bash/cmd : exports the data as shell variables

Connections: HTTP / HTTPS as well as local files or stdin
Systems: Windows (using wininet), Linux (using synapse+openssl), Mac (with newest synapse)


Call xidel --help to see a list of support command line options.

Call xidel --usage for a full reference.


  1. xidel --extract "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']"

  2. xidel --follow "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']" --extract //title --download '{$host}/'

Generally follow all links on a page and print the titles of the linked pages:

With XPath : xidel -f //a -e //title

With CSS : xidel -f "css('a')" --css title

With Templates: xidel -f "<a>{.}</a>*" -e "<title>{.}</title>"

  1. If you have an example.xml file like "<x><foo>ood</foo><bar>IMPORTANT!</bar></x>". You can read the important part like:

xidel example.xml -e "<x><foo>ood</foo><bar>{.}</bar></x>"

(and this will also check, if the part with the ood is there, and fail otherwise)

  1. xidel -e "(1 + 2 + 3) * 1000000000000 + 4 + 5 + 6 + 7.000000008"

  2. xidel -e "<A class='question-hyperlink'>{title:=text(), url:=@href}</A>*"

  3. xidel "" --extract "<t:loop><div class='usertext-body'><div>{outer-xml(.)}</div></div><ul class='flat-list buttons'><a><t:s>link:=@href</t:s>permalink</a></ul></div></div></t:loop>" --follow "<a rel='nofollow next'>{.}</a>?"

  4. Webscraping, combining CSS, XPath, JSONiq and automatically form evaluation:

xidel -f "form(css('form.login-form')[1], {'user': '$your_username', 'passwd': '$your_password'})" -e "css('#mail')/@title"

Using the Reddit API:

xidel -d "user=$your_username&passwd=$your_password&api_type=json" --method GET '' -e '($json).data.has_mail'

  1. xidel --xquery '<table>{for $i in 1 to 1000 return <tr><td>{$i}</td><td>{if ($i mod 2 = 0) then "even" else "odd"}</td></tr>}</table>' --output-format xml

  2. eval "$(xidel http://site -e 'title:=//title' -e 'links:=//a/@href' --output-format bash)"

This sets the bash variable $title to the title of the page and $links becomes an array of all links there


Use XIDEL_OPTIONS to set global command line options.


wget(1), curl(1)


Benito van der Zander, <>,

Download link:

You can test it online on or directly by sending a request to the cgi service like<html><title>foobar</title></html>&extract=//title&raw=true

Copied to clipboard