LinuxCommandLibrary

theharvester

Gather email addresses, subdomains, and names

TLDR

Gather information on a domain using Google

$ theHarvester --domain [domain_name] --source google
copy

Gather information on a domain using multiple sources
$ theHarvester --domain [domain_name] --source [duckduckgo,bing,crtsh]
copy

Change the limit of results to work with
$ theHarvester --domain [domain_name] --source [google] --limit [200]
copy

Save the output to two files in XML and HTML format
$ theHarvester --domain [domain_name] --source [google] --file [output_file_name]
copy

Display help
$ theHarvester --help
copy

SYNOPSIS

theharvester -d domain.com -l limit -b source

PARAMETERS

-d domain.com
    Domain to search or company name.

-l limit
    Limit the number of results to work (integer).

-b source
    Data source: baidu, bing, bingapi, censys, crtsh, dnsdumpster, dogpile, google, google-certificates, googlecse, googleplus, hunter, intelx, linkedin, otx, pentesttools, projectdiscovery, qwant, securitytrails, shodan, threatcrowd, twitter, virustotal, yahoo, yandex. (Use all to search in every search engine)

-s start
    Start with result number X (integer).

-v
    Verify host name via dns resolution and search for virtual hosts.

-n
    Do a DNS reverse lookup on all ranges discovered.

-c
    Perform a DNS brute force for the domain.

-t
    Perform a DNS TLD expansion discovery.

-p
    Perform port scanning on discovered hosts.

-g
    Run Google dorking module.

-m file
    Save the unified output to an XML and JSON file.

-h
    Use SHODAN database to query discovered hosts.

-w file
    Save only the host found into a file.

-e dns server
    Verify host name via DNS resolution and search for virtual hosts.

-f
    Save to HTML file.

-u file
    Verify host name via DNS resolution and search for virtual hosts.

-z
    Do not use DNS resolution.

-ddns
    Use passive DNS resolution.

-a
    Query All data sources.

DESCRIPTION

TheHarvester is a Python-based open-source intelligence (OSINT) tool used for gathering email addresses, subdomains, hostnames, employee names, open ports and banners from different public sources like search engines, DNS servers, and SHODAN.
It is used by penetration testers and security professionals to gather information about a target organization before performing penetration tests or security assessments. The tool can help map out the attack surface of an organization by discovering publicly exposed assets.

The tool can be helpful during the information gathering phase of a penetration test for identifying potential attack vectors. It's capable of passively gathering information. This avoids directly interacting with the target's infrastructure, minimizing the risk of detection.
TheHarvester is widely used and considered a valuable asset in OSINT and reconnaissance activities.

CAVEATS

The effectiveness of the tool depends on the availability and accuracy of data on the public sources it queries. Some sources may require API keys or have rate limits. Usage should comply with terms of service of data sources.

OUTPUT

TheHarvester provides a wealth of information like employee names, email addresses, host names, and subdomains. Output can be saved in HTML, XML, or JSON.

ETHICAL CONSIDERATIONS

It's essential to use TheHarvester ethically and legally. Always obtain proper authorization before performing reconnaissance activities on a target. Misuse of this tool could lead to legal repercussions.

HISTORY

TheHarvester was developed by Christian Martorella (edge-security) and has been actively maintained and updated. It has become a popular tool within the cybersecurity community due to its ease of use and effectiveness in gathering information about target organizations. It's frequently updated to incorporate new data sources and improve its capabilities.

SEE ALSO

dig(1), nslookup(1), whois(1)

Copied to clipboard