LinuxCommandLibrary

linkchecker

Check website links for validity

TLDR

Find broken links on

$ linkchecker [https://example.com/]
copy

Also check URLs that point to external domains
$ linkchecker --check-extern [https://example.com/]
copy

Ignore URLs that match a specific regular expression
$ linkchecker --ignore-url [regular_expression] [https://example.com/]
copy

Output results to a CSV file
$ linkchecker --file-output [csv]/[path/to/file] [https://example.com/]
copy

SYNOPSIS

linkchecker [options] URL...

PARAMETERS

-F format
    Output format. Possible values: text, html, csv.

-o filename
    Write output to the specified file.

-r
    Recursive checking of all links on a page.

-q
    Quiet mode, suppress all non-error messages.

--check-extern
    Check external links (enabled by default).

--no-check-extern
    Disable checking external links.

--ignore-url REGEX
    Ignore URLs matching the given regular expression.

--timeout SECONDS
    Set timeout for connections in seconds.

--version
    Show version information and exit.

--help
    Show help message and exit.

DESCRIPTION

The linkchecker command analyzes HTML documents or websites to find broken links. It recursively traverses web pages starting from a given URL, checking each link for validity. It supports various protocols including HTTP, HTTPS, FTP, mailto, and local file links. It reports broken, redirected, or otherwise problematic links, along with the HTML source and line number where the link is found. It can be configured to respect robots.txt, ignore certain URL patterns, and customize the checking behavior. Its output can be customized with many options, the most common being plain text, HTML or CSV.

CAVEATS

Link checking across the internet can be slow and depend on the availability and responsiveness of remote servers. Some websites may actively block link checkers. Robots.txt rules can be complex and might be misinterpreted. Recursive crawling can easily lead to infinite loops or overwhelm a server; use with caution.

RETURN CODES

linkchecker exits with a code 0 if all checks are successful. Non-zero exit codes indicate errors or broken links found during the process.

CONFIGURATION FILE

linkchecker can be configured through a configuration file (~/.linkchecker/linkcheckerrc) allowing for advanced settings beyond the command-line options. This enables persistent customizations like ignore lists, user agents, and default timeout values.

HISTORY

The linkchecker command has been developed and maintained as an open-source project. Its origins likely lie in the need for web developers and system administrators to automate the process of verifying link integrity on websites. Over time, it has evolved to support various protocols, output formats, and configuration options, making it a versatile tool for web maintenance.

SEE ALSO

wget(1), curl(1)

Copied to clipboard