Linux Command Library
commands
Commands
basic
Basic
tips
Tips

xidel

is a command line tool to download and extract data from HTML/XML pages.

The trivial usage is to extract an expression from a webpage like:

xidel http://www.example.org --extract //title

The next important option is --follow to follow links on a page. The following example will print the titles of all pages that are linked from http://www.example.org.

xidel http://www.example.org --follow //a --extract //title

Xidel supports:

Extract expressions: there are few different kind of extract expressions
o CSS 3 Selectors : to extract simple elements o XPath 3 : to extract values and calculate things with them o XQuery 3 : to create new documents from the extracted values o JSONiq : to work with JSON apis o Templates : to extract several expressions in an easy way using a annotated version of the page for pattern-matching o Multipage templates: i.e. a file that contains templates for several pages o XPath 2 / XQuery 1 : for legacy queries
Following:
o HTTP Codes : Redirections like 30x are automatically followed, while keeping things like cookies o Links : It can follow all links on a page as well as some extracted values
Output formats:
o Adhoc : just prints the data in a human readable format o XML : encodes the data in XML o HTML : encodes the data in HTML o JSON : encodes the data in a JSON o bash/cmd : exports the data as shell variables
Connections: HTTP / HTTPS as well as local files or stdin
Systems: Windows (using wininet), Linux (using synapse+openssl), Mac (with newest synapse)

Call xidel --help to see a list of support command line options.

Call xidel --usage for a full reference.

1. Print all urls found by a google search.

xidel

http://www.google.de/search?q=test --extract "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']"
2. Print the title of all pages found by a google search and download them:

xidel http://www.google.de/search?q=test --follow "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']" --extract //title --download '{$host}/'

Generally follow all links on a page and print the titles of the linked pages:

With XPath : xidel http://example.org -f //a -e //title

With CSS : xidel http://example.org -f "css('a')" --css title

With Templates: xidel http://example.org -f "<a>{.}</a>*" -e "<title>{.}</title>"

3. Another template example:

If you have an example.xml file like "<x><foo>ood</foo><bar>IMPORTANT!</bar></x>". You can read the important part like:

xidel example.xml -e "<x><foo>ood</foo><bar>{.}</bar></x>"

(and this will also check, if the part with the ood is there, and fail otherwise)

5. Calculate something with XPath:

xidel -e "(1 + 2 + 3) * 1000000000000 + 4 + 5 + 6 + 7.000000008"

6. Print all newest stackoverflow questions with title and url:

xidel http://stackoverflow.com -e "<A class='question-hyperlink'>{title:=text(), url:=@href}</A>*"

7. Print all reddit comments of an user, with html and url:

xidel "http://www.reddit.com/user/username/" --extract "<t:loop><div class='usertext-body'><div>{outer-xml(.)}</div></div><ul class='flat-list buttons'><a><t:s>link:=@href</t:s>permalink</a></ul></div></div></t:loop>" --follow "<a rel='nofollow next'>{.}</a>?"

8. Check if your reddit letter is red:

Webscraping, combining CSS, XPath, JSONiq and automatically form evaluation:

xidel http://reddit.com -f "form(css('form.login-form')[1], {'user': '$your_username', 'passwd': '$your_password'})" -e "css('#mail')/@title"

Using the Reddit API:

xidel -d "user=$your_username&passwd=$your_password&api_type=json" https://ssl.reddit.com/api/login --method GET 'http://www.reddit.com/api/me.json' -e '($json).data.has_mail'

9. Use XQuery, to create a html table of odd and even numbers:

xidel --xquery '<table>{for $i in 1 to 1000 return <tr><td>{$i}</td><td>{if ($i mod 2 = 0) then "even" else "odd"}</td></tr>}</table>' --output-format xml

10. Export variables to bash

eval "$(xidel http://site -e 'title:=//title' -e 'links:=//a/@href' --output-format bash)"

This sets the bash variable $title to the title of the page and $links becomes an array of all links there

Use XIDEL_OPTIONS to set global command line options.

Benito van der Zander, <benito_NOSPAM_benibela.de>, http://www.benibela.de

Download link: http://sourceforge.net/projects/videlibri/files/Xidel/

You can test it online on http://www.videlibri.de/cgi-bin/xidelcgi or directly by sending a request to the cgi service like http://www.videlibri.de/cgi-bin/xidelcgi?data=<html><title>foobar</title></html>&extract=//title&raw=true

wget(1), curl(1)

play store download app store download
Imprint
Sonnenallee 29, 12047 Berlin, Germany
e-mail: sschubert89@gmail.com

Privacy policy
Successfully copied