theharvester
Gather email addresses, subdomains, and names
TLDR
Gather information on a domain using Google
Gather information on a domain using multiple sources
Change the limit of results to work with
Save the output to two files in XML and HTML format
Display help
SYNOPSIS
theharvester -d domain -b source [options]
PARAMETERS
-d domain
The target domain name to search for.
-b source
The data source(s) to use. Common sources include: all, baidu, bing, brave, censys, crtsh, dnsdumpster, duckduckgo, fullhunt, github-code, google, hunter, intelx, linkedin, linkedin_contacts, maxmind, netcraft, otx, pgp, rapiddns, rocketreach, securityTrails, shodan, spyse, subdomain_center, threatcrowd, trello, twitter, urlscan, virustotal, whatweb, zoomeye.
-l limit
Limit the number of results (default: 500). Some sources may have lower internal limits.
-f filename
Save the results to an HTML and/or XML file (e.g., results.html, results.xml).
-n
Perform a DNS reverse lookup on the discovered IP addresses.
-c
Perform a DNS brute force for hostnames.
-t
Perform a DNS TLD expansion brute force (e.g., .com, .org, .net).
-e DNS server
Use a specific DNS server for lookups.
-s
Perform a DNS server enumeration on the target domain.
-p
Perform a port scan on discovered hosts using nmap (requires nmap installed).
-v
Show the version of theharvester.
--screenshot
Take screenshots of discovered web pages (requires Playwright and a web browser setup).
--take-over
Check for possible domain takeovers on discovered CNAME entries.
--dns-lookup
Perform DNS lookups for all discovered hosts to resolve IP addresses.
--shodan-lookup
Use Shodan to find open ports and banners for discovered hosts (requires a Shodan API key).
DESCRIPTION
theharvester is a simple but effective open-source intelligence (OSINT) tool used for gathering publicly available information such as email addresses, subdomains, hostnames, employee names, and open ports. It collects data from various public sources including search engines (Google, Bing, Baidu, Yahoo), PGP key servers, LinkedIn, Twitter, and specialized services like Shodan, Censys, and Netcraft.
The primary purpose of theharvester is to help penetration testers, security researchers, and red teams gather initial reconnaissance data on their target organizations. It automates the process of querying multiple data sources, consolidating the information, and presenting it in a structured format, which can then be used for further analysis, attack surface mapping, or social engineering efforts.
CAVEATS
theharvester relies heavily on public API access and search engine scraping.
1. API Keys: Many advanced sources (e.g., Shodan, Censys, Hunter.io) require valid API keys, which must be configured in the `api-keys.yaml` file (usually located in `~/.theharvester/`). Without these, certain searches will yield limited or no results.
2. Rate Limiting: Search engines and services often implement rate limiting. Excessive or frequent queries can lead to temporary IP blocks or CAPTCHA challenges, reducing the effectiveness of the tool.
3. Accuracy: The gathered information is as accurate as the public sources it scrapes. Data might be outdated, incorrect, or misleading. Always verify critical information.
4. Legality/Ethics: While OSINT is generally legal, the subsequent use of gathered information (e.g., for unauthorized access attempts) may not be. Always operate within legal and ethical boundaries, and ensure you have proper authorization for any target you are investigating.
API KEYS CONFIGURATION
For many advanced sources, theharvester requires API keys. These keys should be stored in a file named api-keys.yaml, typically located in ~/.theharvester/. Users need to manually create this file and populate it with their respective API keys for services like Shodan, Censys, Hunter.io, etc.
Example entry for Shodan in api-keys.yaml:
shodan: YOUR_SHODAN_API_KEY
INSTALLATION AND DEPENDENCIES
theharvester is often pre-installed on penetration testing distributions like Kali Linux. If not, it can typically be installed via pip: pip3 install theharvester.
Some functionalities, like port scanning (-p), require additional tools such as Nmap to be installed on the system. Similarly, the screenshot feature (--screenshot) requires Playwright and a compatible browser to be set up.
HISTORY
theharvester was initially developed by Christian Martorella as part of the BackTrack Linux distribution (now Kali Linux). It quickly gained popularity within the penetration testing and security research communities due to its simplicity and effectiveness in automating the initial reconnaissance phase.
Over the years, it has undergone continuous development, with new data sources being added and existing ones updated to maintain compatibility with changes in search engine algorithms and API endpoints. Its inclusion in major security distributions like Kali Linux has cemented its status as a go-to tool for OSINT.
SEE ALSO
maltego(1), nmap(1), recon-ng(1), metasploit(1), whois(1)