linkchecker
Check website links for validity
TLDR
Find broken links on
Also check URLs that point to external domains
Ignore URLs that match a specific regular expression
Output results to a CSV file
SYNOPSIS
linkchecker [options] URL...
PARAMETERS
-F format
Output format. Possible values: text, html, csv.
-o filename
Write output to the specified file.
-r
Recursive checking of all links on a page.
-q
Quiet mode, suppress all non-error messages.
--check-extern
Check external links (enabled by default).
--no-check-extern
Disable checking external links.
--ignore-url REGEX
Ignore URLs matching the given regular expression.
--timeout SECONDS
Set timeout for connections in seconds.
--version
Show version information and exit.
--help
Show help message and exit.
DESCRIPTION
The linkchecker command analyzes HTML documents or websites to find broken links. It recursively traverses web pages starting from a given URL, checking each link for validity. It supports various protocols including HTTP, HTTPS, FTP, mailto, and local file links. It reports broken, redirected, or otherwise problematic links, along with the HTML source and line number where the link is found. It can be configured to respect robots.txt, ignore certain URL patterns, and customize the checking behavior. Its output can be customized with many options, the most common being plain text, HTML or CSV.
CAVEATS
Link checking across the internet can be slow and depend on the availability and responsiveness of remote servers. Some websites may actively block link checkers. Robots.txt rules can be complex and might be misinterpreted. Recursive crawling can easily lead to infinite loops or overwhelm a server; use with caution.
RETURN CODES
linkchecker exits with a code 0 if all checks are successful. Non-zero exit codes indicate errors or broken links found during the process.
CONFIGURATION FILE
linkchecker can be configured through a configuration file (~/.linkchecker/linkcheckerrc) allowing for advanced settings beyond the command-line options. This enables persistent customizations like ignore lists, user agents, and default timeout values.
HISTORY
The linkchecker command has been developed and maintained as an open-source project. Its origins likely lie in the need for web developers and system administrators to automate the process of verifying link integrity on websites. Over time, it has evolved to support various protocols, output formats, and configuration options, making it a versatile tool for web maintenance.