httrack
Download websites for offline viewing
SYNOPSIS
httrack <arguments>
PARAMETERS
--help
Display help information.
%n
Project name
%N
Full project path
%P
Proxy string
%s
Server name
%u
URL list (separated by commas)
%t
Number of connections
%c
Maximum number of connections (per second)
%l
Log file name
%e
Error file name
%O
Path for temporary files
%d
Local directory
%I
index.html filename
--near
Get near pages
--get-html
Get html files
--get-images
Get images
--get-others
Get all files
--depth <depth>
Maximum link depth (default 7)
--robots <0|1>
Ignore/Obey robots.txt (default 1)
+*.gif +*.jpg +*.png +*.css +*.js
Example of accepting only specific file types
-mime:application/x-shockwave-flash
Example of excluding specific MIME types
DESCRIPTION
HTTrack is a free (GPL) and easy-to-use offline browser utility. It allows you to download a website from the Internet to a local directory, building recursively all structures, getting HTML, images, and other files from the server to your computer. HTTrack arranges the downloaded site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored website, and resume interrupted downloads. It is fully configurable, and has an integrated help system.
It's primarily used for creating offline versions of websites for archival, analysis, or simply for offline browsing where internet access is limited or unreliable. Think of it as a tool to 'clone' a website onto your local machine.
CAVEATS
HTTrack can be resource-intensive, especially when mirroring large websites. Using it carelessly might put a strain on the target server or consume significant disk space locally. Respect robots.txt and website terms of service to avoid being blocked or causing harm.
IMPORTANT CONSIDERATIONS
Always check the target website's terms of service and robots.txt file before using HTTrack. Be mindful of server load and avoid excessive requests. Use filtering options wisely to avoid downloading unnecessary content. Some websites may employ anti-scraping techniques that prevent or hinder HTTrack from working correctly.
HISTORY
HTTrack was initially developed in 1996. It has been actively maintained and improved over the years. HTTrack became popular as a way to archive and browse websites offline at a time when internet access was less pervasive and/or less reliable. It has remained a useful tool even with ubiquitous internet access due to its archival capabilities and ability to browse websites offline. It supports multiple platforms, including Linux and Windows.