scrapy

TLDR

Create a project

$ scrapy startproject [project_name]

Create a spider (in project directory)

$ scrapy genspider [spider_name] [website_domain]

Edit spider (in project directory)

$ scrapy edit [spider_name]

Run spider (in project directory)

$ scrapy crawl [spider_name]

Fetch a webpage as Scrapy sees it and print the source to stdout

$ scrapy fetch [url]

Open a webpage in the default browser as Scrapy sees it (disable JavaScript for extra fidelity)

$ scrapy view [url]

Open Scrapy shell for URL, which allows interaction with the page source in a Python shell (or IPython if available)

$ scrapy shell [url]

Scrapy is controlled through the scrapy command-line tool. The script provides several commands, for different purposes. Each command supports its own particular syntax. In other words, each command supports a different set of arguments and options.

OPTIONS

fetch [OPTION] URL

Fetch a URL using the Scrapy downloader
--headers: Print response HTTP headers instead of body

runspider [OPTION] spiderfile

Run a spider

--output=FILE: Store scraped items to FILE in XML format

settings [OPTION]

Query Scrapy settings

--get=SETTING: Print raw setting value
--getbool=SETTING: Print setting value, interpreted as a boolean
--getint=SETTING: Print setting value, interpreted as an integer
--getfloat=SETTING: Print setting value, interpreted as a float
--getlist=SETTING: Print setting value, interpreted as a float
--init: Print initial setting value (before loading extensions and spiders)

shell URL | file

Launch the interactive scraping console

startproject projectname

Create new project with an initial project template

--help, -h

Print command help and options

--logfile=FILE

Log file. if omitted stderr will be used

--loglevel=LEVEL, -L LEVEL

Log level (default: None)

--nolog

Disable logging completely

--spider=SPIDER

Always use this spider when arguments are urls

--profile=FILE

Write python cProfile stats to FILE

--lsprof=FILE

Write lsprof profiling stats to FILE

--pidfile=FILE

Write process ID to FILE

--set=NAME=VALUE, -s NAME=VALUE

Set/override setting (may be repeated)

AUTHOR

Scrapy was written by the Scrapy Developers.

This manual page was written by Ignace Mouzannar <mouzannar@gmail.com>, for the Debian project (but may be used by others).

scrapy

scrapy

TLDR

SYNOPSIS

DESCRIPTION

OPTIONS

fetch [OPTION] URL

runspider [OPTION] spiderfile

settings [OPTION]

shell URL | file

startproject projectname

--help, -h

--logfile=FILE

--loglevel=LEVEL, -L LEVEL

--nolog

--spider=SPIDER

--profile=FILE

--lsprof=FILE

--pidfile=FILE

--set=NAME=VALUE, -s NAME=VALUE

AUTHOR