LinuxCommandLibrary

lychee

Check links for broken URLs

TLDR

Scan a website for broken links

$ lychee [https://example.com]
copy

Display a breakdown of error types
$ lychee --format detailed [https://example.com]
copy

Limit the amount of connections to prevent DDOS protection
$ lychee --max-concurrency [5] [links.txt]
copy

Check files in a directory structure for any broken URLs
$ grep [[-r|--recursive]] "[pattern]" | lychee -
copy

Display help
$ lychee --help
copy

SYNOPSIS

lychee [OPTIONS] [FILES_OR_URLS...]

PARAMETERS

--help, -h
    Display a help message and exit.

--version, -V
    Print version information and exit.

--config
    Specify a path to a configuration file (e.g., lychee.toml).

--base-url
    Set a base URL for resolving relative links.

--exclude
    Exclude links matching a glob pattern or a regular expression.

--include
    Only include links matching a glob pattern or a regular expression.

--no-progress
    Disable the progress bar during checks.

--retry-wait
    Seconds to wait before retrying a failed request.

--max-redirects
    Maximum number of redirects to follow for a single link.

--timeout
    Maximum time in seconds for a single HTTP request to complete.

--user-agent
    Set a custom User-Agent header for HTTP requests.

--output
    Specify the output format (e.g., json, markdown, html, csv).

--disable-external
    Do not check external links, only process internal links.

--disable-private
    Do not check links to private IP addresses.

--accept
    Accept specific HTTP status codes as valid (e.g., 403, 500).

--force-http-status-check
    Force HTTP status check for all links, even for local files.

DESCRIPTION

lychee is a powerful and versatile command-line tool designed for efficiently checking the validity of links across various sources. Written in Rust, it's renowned for its speed, concurrency, and comprehensive capabilities. It can scan URLs, local files, and even standard input for broken links, dead redirects, and other connectivity issues. lychee supports checking links within HTML, Markdown, reStructuredText, and even plain text files. Its core purpose is to help developers, website administrators, and content creators maintain the integrity of their web resources, preventing users from encountering frustrating "404 Not Found" errors. It's often integrated into CI/CD pipelines to automatically validate links during development workflows, ensuring that website deployments are always free of broken references. The tool offers extensive customization options for timeouts, concurrency, request headers, and output formats.

CAVEATS

Network Dependency: lychee requires network connectivity to check external URLs.
Rate Limiting: Checking a large number of links on a single domain too quickly can trigger server-side rate limits, leading to temporary blocks or inaccurate results.
False Positives/Negatives: Complex website configurations, JavaScript-driven content, or strict firewalls might occasionally lead to links being misidentified as broken or valid.
Resource Usage: For very large projects with millions of links, lychee can consume significant network bandwidth and system resources due to its concurrent nature.

CONFIGURATION FILES

lychee supports configuration files (e.g., lychee.toml) to store frequently used options, exclusions, and custom headers, simplifying complex checks and promoting consistency across projects.

OUTPUT FORMATS

The tool provides flexible output options, including human-readable console output, JSON for machine processing, Markdown for reports, and HTML for interactive results, allowing users to integrate its findings into various reporting systems.

HISTORY

lychee emerged as a modern, high-performance link checker, leveraging the speed and memory safety of the Rust programming language. It was developed by the lychee-cli community with a focus on providing a robust solution for automated link validation. Its design prioritizes concurrency and comprehensive error reporting, making it particularly well-suited for integration into continuous integration (CI) and continuous delivery (CD) workflows. Since its initial release, it has gained significant traction among developers and DevOps teams for its reliability and ease of use in maintaining healthy web assets.

SEE ALSO

curl(1), wget(1), grep(1), find(1), htmlproofer

Copied to clipboard