httrack
TLDR
Mirror a website to the current directory
SYNOPSIS
httrack [url] [-options] [+filters] [-filters]
httrack --mirror url -O path
httrack --continue | --update
DESCRIPTION
httrack is a website copier that downloads websites to a local directory for offline browsing. It preserves the original site structure, converting links to work locally. The mirrored site can be browsed offline using any web browser.
The tool follows links to specified depths, downloads files, and reconstructs relative paths. It supports HTTP and HTTPS protocols, authentication, cookies, and proxy servers. Filters control which files are downloaded using wildcard patterns.
HTTrack can update previously mirrored sites, downloading only changed files. It handles interrupted downloads gracefully with the continue option. The webhttrack command provides a browser-based graphical interface.
PARAMETERS
-O, --path PATH
Output/project path-w, --mirror
Mirror websites (default mode)-W, --mirror-wizard
Mirror websites with interactive wizard-g, --get-files
Get files without mirroring structure-i, --continue
Continue an interrupted download-r N, --depth N
Set link depth limit (default: unlimited)-m N, --max-files N
Maximum number of files to download-M N, --max-size N
Maximum total size in bytes-E N, --max-time N
Maximum mirror time in seconds-A N, --max-rate N
Maximum transfer rate (bytes/second)-c N, --sockets N
Number of simultaneous connections-T N, --timeout N
Connection timeout in seconds-R N, --retries N
Number of retry attempts-P, --proxy HOST:PORT
Use proxy server-K N, --keep-links N
Keep original link format (0=relative, 2=absolute)-x, --replace-external
Replace external links with error page-n, --near
Get non-HTML files near links-t, --test
Test links only, do not download-q, --quiet
Quiet mode, no output-v, --verbose
Verbose output-s0, --robots=0
Ignore robots.txt-h, --help
Display help
FILTERS
+pattern
Include URLs matching pattern-pattern
Exclude URLs matching pattern**+*.pdf**
Include all PDF files**-*.exe**
Exclude all EXE files**+example.com/* -***
Only mirror from specific domain
CAVEATS
Mirroring websites may violate terms of service or copyright laws. Always check robots.txt and site policies. JavaScript-rendered content and dynamically generated pages may not mirror correctly. Some sites employ anti-scraping measures that can block HTTrack. CGI scripts and server-side functionality will not work in the offline copy. Large sites can consume significant disk space and bandwidth.
HISTORY
HTTrack was created by Xavier Roche and first released in 1998. Written in C, it became one of the most popular open-source website mirroring tools. The project provides both command-line and GUI interfaces across Windows, Linux, and other Unix-like systems. Development continues with regular updates to handle modern web technologies.
SEE ALSO
wget(1), curl(1), webhttrack(1)


