scrapy
scrapy
TLDR
Create a project
Create a spider (in project directory)
Edit spider (in project directory)
Run spider (in project directory)
Fetch a webpage as Scrapy sees it and print the source to stdout
Open a webpage in the default browser as Scrapy sees it (disable JavaScript for extra fidelity)
Open Scrapy shell for URL, which allows interaction with the page source in a Python shell (or IPython if available)
SYNOPSIS
scrapy [command] [OPTIONS] ...
DESCRIPTION
Scrapy is controlled through the scrapy command-line tool. The script provides several commands, for different purposes. Each command supports its own particular syntax. In other words, each command supports a different set of arguments and options.
OPTIONS
fetch [OPTION] URL
- Fetch a URL using the Scrapy downloader
- --headers
-
Print response HTTP headers instead of body
runspider [OPTION] spiderfile
Run a spider
- --output=FILE
-
Store scraped items to FILE in XML format
settings [OPTION]
Query Scrapy settings
- --get=SETTING
-
Print raw setting value
- --getbool=SETTING
-
Print setting value, interpreted as a boolean
- --getint=SETTING
-
Print setting value, interpreted as an integer
- --getfloat=SETTING
-
Print setting value, interpreted as a float
- --getlist=SETTING
-
Print setting value, interpreted as a float
- --init
-
Print initial setting value (before loading extensions and spiders)
shell URL | file
Launch the interactive scraping console
startproject projectname
Create new project with an initial project template
--help, -h
Print command help and options
--logfile=FILE
Log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
Log level (default: None)
--nolog
Disable logging completely
--spider=SPIDER
Always use this spider when arguments are urls
--profile=FILE
Write python cProfile stats to FILE
--lsprof=FILE
Write lsprof profiling stats to FILE
--pidfile=FILE
Write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
Set/override setting (may be repeated)
AUTHOR
Scrapy was written by the Scrapy Developers.
This manual page was written by Ignace Mouzannar <mouzannar@gmail.com>, for the Debian project (but may be used by others).