LinuxCommandLibrary

theharvester

Gather email addresses, subdomains, and names

TLDR

Gather information on a domain using Google

$ theHarvester --domain [domain_name] --source google
copy

Gather information on a domain using multiple sources
$ theHarvester --domain [domain_name] --source [duckduckgo,bing,crtsh]
copy

Change the limit of results to work with
$ theHarvester --domain [domain_name] --source [google] --limit [200]
copy

Save the output to two files in XML and HTML format
$ theHarvester --domain [domain_name] --source [google] --file [output_file_name]
copy

Display help
$ theHarvester --help
copy

SYNOPSIS

theharvester -d domain -b source [options]

PARAMETERS

-d domain
    The target domain name to search for.

-b source
    The data source(s) to use. Common sources include: all, baidu, bing, brave, censys, crtsh, dnsdumpster, duckduckgo, fullhunt, github-code, google, hunter, intelx, linkedin, linkedin_contacts, maxmind, netcraft, otx, pgp, rapiddns, rocketreach, securityTrails, shodan, spyse, subdomain_center, threatcrowd, trello, twitter, urlscan, virustotal, whatweb, zoomeye.

-l limit
    Limit the number of results (default: 500). Some sources may have lower internal limits.

-f filename
    Save the results to an HTML and/or XML file (e.g., results.html, results.xml).

-n
    Perform a DNS reverse lookup on the discovered IP addresses.

-c
    Perform a DNS brute force for hostnames.

-t
    Perform a DNS TLD expansion brute force (e.g., .com, .org, .net).

-e DNS server
    Use a specific DNS server for lookups.

-s
    Perform a DNS server enumeration on the target domain.

-p
    Perform a port scan on discovered hosts using nmap (requires nmap installed).

-v
    Show the version of theharvester.

--screenshot
    Take screenshots of discovered web pages (requires Playwright and a web browser setup).

--take-over
    Check for possible domain takeovers on discovered CNAME entries.

--dns-lookup
    Perform DNS lookups for all discovered hosts to resolve IP addresses.

--shodan-lookup
    Use Shodan to find open ports and banners for discovered hosts (requires a Shodan API key).

DESCRIPTION

theharvester is a simple but effective open-source intelligence (OSINT) tool used for gathering publicly available information such as email addresses, subdomains, hostnames, employee names, and open ports. It collects data from various public sources including search engines (Google, Bing, Baidu, Yahoo), PGP key servers, LinkedIn, Twitter, and specialized services like Shodan, Censys, and Netcraft.

The primary purpose of theharvester is to help penetration testers, security researchers, and red teams gather initial reconnaissance data on their target organizations. It automates the process of querying multiple data sources, consolidating the information, and presenting it in a structured format, which can then be used for further analysis, attack surface mapping, or social engineering efforts.

CAVEATS

theharvester relies heavily on public API access and search engine scraping.

1. API Keys: Many advanced sources (e.g., Shodan, Censys, Hunter.io) require valid API keys, which must be configured in the `api-keys.yaml` file (usually located in `~/.theharvester/`). Without these, certain searches will yield limited or no results.
2. Rate Limiting: Search engines and services often implement rate limiting. Excessive or frequent queries can lead to temporary IP blocks or CAPTCHA challenges, reducing the effectiveness of the tool.
3. Accuracy: The gathered information is as accurate as the public sources it scrapes. Data might be outdated, incorrect, or misleading. Always verify critical information.
4. Legality/Ethics: While OSINT is generally legal, the subsequent use of gathered information (e.g., for unauthorized access attempts) may not be. Always operate within legal and ethical boundaries, and ensure you have proper authorization for any target you are investigating.

API KEYS CONFIGURATION

For many advanced sources, theharvester requires API keys. These keys should be stored in a file named api-keys.yaml, typically located in ~/.theharvester/. Users need to manually create this file and populate it with their respective API keys for services like Shodan, Censys, Hunter.io, etc.

Example entry for Shodan in api-keys.yaml:
shodan: YOUR_SHODAN_API_KEY

INSTALLATION AND DEPENDENCIES

theharvester is often pre-installed on penetration testing distributions like Kali Linux. If not, it can typically be installed via pip: pip3 install theharvester.

Some functionalities, like port scanning (-p), require additional tools such as Nmap to be installed on the system. Similarly, the screenshot feature (--screenshot) requires Playwright and a compatible browser to be set up.

HISTORY

theharvester was initially developed by Christian Martorella as part of the BackTrack Linux distribution (now Kali Linux). It quickly gained popularity within the penetration testing and security research communities due to its simplicity and effectiveness in automating the initial reconnaissance phase.

Over the years, it has undergone continuous development, with new data sources being added and existing ones updated to maintain compatibility with changes in search engine algorithms and API endpoints. Its inclusion in major security distributions like Kali Linux has cemented its status as a go-to tool for OSINT.

SEE ALSO

maltego(1), nmap(1), recon-ng(1), metasploit(1), whois(1)

Copied to clipboard