LinuxCommandLibrary

gau

Fetch known URLs from AlienVault's Open Threat Exchange

TLDR

Fetch all URLs of a domain from AlienVault's Open Threat Exchange, the Wayback Machine, Common Crawl, and URLScan

$ gau [example.com]
copy

Fetch URLs of multiple domains
$ gau [domain1 domain2 ...]
copy

Fetch all URLs of several domains from an input file, running multiple threads
$ gau --threads [4] < [path/to/domains.txt]
copy

Write [o]utput results to a file
$ gau [example.com] --o [path/to/found_urls.txt]
copy

Search for URLs from only one specific provider
$ gau --providers [wayback|commoncrawl|otx|urlscan] [example.com]
copy

Search for URLs from multiple providers
$ gau --providers [wayback,otx,...] [example.com]
copy

Search for URLs within specific date range
$ gau --from [YYYYMM] --to [YYYYMM] [example.com]
copy

SYNOPSIS

gau [options] <domain> | --many

PARAMETERS

-b, --blacklist <extensions>
    Comma-separated list of extensions to blacklist (e.g., jpg,png,css).

-d, --dir <directory>
    Output results to the specified directory.

-o, --output <file>
    Write results to the specified file instead of standard output.

-p, --providers <names>
    Comma-separated list of providers to use (e.g., wayback,commoncrawl,otx). Default: all.

-s, --subs
    Include subdomains in the results.

-t, --threads <number>
    Number of concurrent threads to use (default: 10).

-v, --version
    Display the current version of gau.

--blacklist-domain <domains>
    Comma-separated list of domains to blacklist (e.g., cdn.example.com).

--delay <milliseconds>
    Delay between requests in milliseconds to prevent rate limiting.

--json
    Output results in JSON format.

--many
    Accept multiple domains via standard input, one domain per line.

--no-subs
    Exclude subdomains from the results. Useful when only one domain is provided.

--retries <number>
    Number of retries for failed HTTP requests (default: 10).

--timeout <seconds>
    HTTP client timeout in seconds (default: 10).

--verbose
    Enable verbose output, showing more details during execution.

DESCRIPTION

gau (Get All URLs) is a Go-based command-line tool designed to fetch known URLs for a specified domain. It aggregates URLs from various open-source intelligence (OSINT) services, including the Wayback Machine, Common Crawl, and Open Threat Exchange (OTX). This tool is particularly valuable for security researchers, penetration testers, and bug bounty hunters to quickly enumerate potential attack surfaces by discovering historical and archived URLs associated with a target domain.

It offers various options to refine the URL gathering process, such as filtering by extensions, including or excluding subdomains, specifying which providers to query, and controlling concurrency for performance. The output can be directed to a file, displayed on the console, or even formatted as JSON, making it flexible for integration into automated workflows. gau streamlines the initial reconnaissance phase by providing a comprehensive list of known URLs, significantly expanding the scope for further analysis or vulnerability discovery.

CAVEATS

gau is not a standard Linux distribution command; it's a third-party tool typically installed via Go. Its functionality relies heavily on external services (Wayback Machine, Common Crawl, OTX), meaning its performance and accuracy are dependent on the availability and rate limits of these services.

It can generate a large volume of URLs, requiring careful handling of output, and some returned URLs might be outdated or non-existent.

INSTALLATION

gau is typically installed using Go's package manager. Ensure Go is installed on your system, then run:
go install github.com/lc/gau@latest
This command will compile and place the gau executable in your $GOPATH/bin directory, which should be in your system's PATH.

USAGE EXAMPLES

Get URLs for a single domain:
gau example.com

Include subdomains in the results:
gau example.com --subs

Use specific providers and save output to a file:
gau example.com -p wayback,commoncrawl -o urls.txt

Process multiple domains from a file via stdin:
cat domains.txt | gau --many

Output in JSON format:
gau example.com --json

HISTORY

gau was developed by lc (Luke Cyca) as a Go-based tool to simplify and automate the process of gathering URLs from various archival and intelligence sources. It quickly gained popularity within the security community, particularly among penetration testers and bug bounty hunters, for its efficiency and ease of use in the initial reconnaissance phases. Its development reflects the community's need for aggregated intelligence on target domains.

SEE ALSO

curl(1), grep(1), awk(1), subfinder, assetfinder, katana

Copied to clipboard