dirb
Discover website directories and files
TLDR
Scan a webserver using the default wordlist
Scan a webserver using a custom wordlist
Scan a webserver non-recursively
Scan a webserver using a specified user-agent and cookie for HTTP-requests
SYNOPSIS
dirb <url_base> [<wordlist_file>] [<options>]
PARAMETERS
-a <agent_string>
Sets the User-Agent HTTP header for all requests.
-b
Suppresses the initial banner output.
-c <cookie_string>
Sets a custom HTTP Cookie header for all requests.
-E <status_code>
Excludes specific HTTP status codes from being shown in the output.
-H <header_string>
Adds a custom HTTP header to all requests (e.g., 'Authorization: Bearer token').
-N <status_code>
Hides specific HTTP status codes from the output (similar to -E).
-o <file>
Saves the scan results to the specified output file.
-p <proxy:port>
Routes all HTTP traffic through the specified proxy server.
-r
Enables recursive directory scanning on discovered directories.
-s
Suppresses the display of 404 (Not Found) errors in the output.
-t <threads>
Sets the number of concurrent connections (threads) for scanning.
-u <user:pass>
Specifies credentials for HTTP Basic Authentication.
-X <extensions>
Appends common file extensions (e.g., '.php,.html,.bak') to each wordlist entry.
-z <delay_ms>
Adds a delay in milliseconds between each HTTP request to avoid detection or rate limiting.
DESCRIPTION
dirb is a web content scanner written in C, designed to brute-force directories and files on web servers. It performs a dictionary-based attack, attempting to guess valid paths and file names that might not be directly linked or easily discoverable. Commonly used in penetration testing and security auditing, dirb helps uncover hidden sensitive files (e.g., configuration, backup, logs), unlisted directories, or outdated vulnerable web applications.
It functions by sending HTTP requests for each word in a provided wordlist and analyzing the server's responses (status codes like 200 OK, 301 Moved Permanently, 403 Forbidden, 404 Not Found) to determine resource existence. dirb supports features such as proxies, HTTP authentication, recursive scanning, and custom extensions, making it a versatile tool for web application reconnaissance. Its effectiveness is highly dependent on the quality and comprehensiveness of the wordlists used.
CAVEATS
Wordlist Dependency: The effectiveness of dirb heavily relies on the quality and comprehensiveness of the wordlist used. A poor wordlist will yield poor results.
Rate Limiting & IP Bans: Aggressive scanning can trigger server-side rate limiting, firewall blocks, or IP bans, potentially disrupting service or leading to detection.
Legal and Ethical Use: dirb should only be used against targets for which you have explicit, written permission. Unauthorized scanning can be illegal and unethical.
False Positives/Negatives: May produce false positives (e.g., misinterpreting custom 404 pages as valid content) or false negatives (missing files if the wordlist is incomplete or the server returns non-standard error codes).
DEFAULT WORDLISTS
dirb typically includes several default wordlists, commonly found in /usr/share/dirb/wordlists/. Notable examples include common.txt, big.txt, and small.txt, which provide a solid foundation for initial discovery scans.
RECURSIVE SCANNING BEHAVIOR
When the -r option is enabled, dirb performs recursive scanning. This means that upon discovering a valid directory, it will automatically initiate a new brute-force scan within that subdirectory, allowing for a deeper and more comprehensive mapping of the web server's hierarchical structure.
HISTORY
dirb has long been a foundational tool in web penetration testing, valued for its straightforwardness, efficiency, and effectiveness in dictionary-based web content discovery. Developed by The Dark Raver and written in C, it benefits from high performance. While more contemporary tools like Gobuster and FFUF, written in Go, offer potentially faster scans and advanced features, dirb maintains its widespread usage and is a standard inclusion in security-focused Linux distributions such as Kali Linux. Its development has consistently focused on enhancing its core capability for robust web content brute-forcing.