gau
Fetch known URLs from AlienVault's Open Threat Exchange
TLDR
Fetch all URLs of a domain from AlienVault's Open Threat Exchange, the Wayback Machine, Common Crawl, and URLScan
Fetch URLs of multiple domains
Fetch all URLs of several domains from an input file, running multiple threads
Write [o]utput results to a file
Search for URLs from only one specific provider
Search for URLs from multiple providers
Search for URLs within specific date range
SYNOPSIS
gau [options] <domain> | --many
PARAMETERS
-b, --blacklist <extensions>
Comma-separated list of extensions to blacklist (e.g., jpg,png,css).
-d, --dir <directory>
Output results to the specified directory.
-o, --output <file>
Write results to the specified file instead of standard output.
-p, --providers <names>
Comma-separated list of providers to use (e.g., wayback,commoncrawl,otx). Default: all.
-s, --subs
Include subdomains in the results.
-t, --threads <number>
Number of concurrent threads to use (default: 10).
-v, --version
Display the current version of gau.
--blacklist-domain <domains>
Comma-separated list of domains to blacklist (e.g., cdn.example.com).
--delay <milliseconds>
Delay between requests in milliseconds to prevent rate limiting.
--json
Output results in JSON format.
--many
Accept multiple domains via standard input, one domain per line.
--no-subs
Exclude subdomains from the results. Useful when only one domain is provided.
--retries <number>
Number of retries for failed HTTP requests (default: 10).
--timeout <seconds>
HTTP client timeout in seconds (default: 10).
--verbose
Enable verbose output, showing more details during execution.
DESCRIPTION
gau (Get All URLs) is a Go-based command-line tool designed to fetch known URLs for a specified domain. It aggregates URLs from various open-source intelligence (OSINT) services, including the Wayback Machine, Common Crawl, and Open Threat Exchange (OTX). This tool is particularly valuable for security researchers, penetration testers, and bug bounty hunters to quickly enumerate potential attack surfaces by discovering historical and archived URLs associated with a target domain.
It offers various options to refine the URL gathering process, such as filtering by extensions, including or excluding subdomains, specifying which providers to query, and controlling concurrency for performance. The output can be directed to a file, displayed on the console, or even formatted as JSON, making it flexible for integration into automated workflows. gau streamlines the initial reconnaissance phase by providing a comprehensive list of known URLs, significantly expanding the scope for further analysis or vulnerability discovery.
CAVEATS
gau is not a standard Linux distribution command; it's a third-party tool typically installed via Go. Its functionality relies heavily on external services (Wayback Machine, Common Crawl, OTX), meaning its performance and accuracy are dependent on the availability and rate limits of these services.
It can generate a large volume of URLs, requiring careful handling of output, and some returned URLs might be outdated or non-existent.
INSTALLATION
gau is typically installed using Go's package manager. Ensure Go is installed on your system, then run:
go install github.com/lc/gau@latest
This command will compile and place the gau executable in your $GOPATH/bin directory, which should be in your system's PATH.
USAGE EXAMPLES
Get URLs for a single domain:
gau example.com
Include subdomains in the results:
gau example.com --subs
Use specific providers and save output to a file:
gau example.com -p wayback,commoncrawl -o urls.txt
Process multiple domains from a file via stdin:
cat domains.txt | gau --many
Output in JSON format:
gau example.com --json
HISTORY
gau was developed by lc (Luke Cyca) as a Go-based tool to simplify and automate the process of gathering URLs from various archival and intelligence sources. It quickly gained popularity within the security community, particularly among penetration testers and bug bounty hunters, for its efficiency and ease of use in the initial reconnaissance phases. Its development reflects the community's need for aggregated intelligence on target domains.