cewl
Spider websites to generate wordlists
TLDR
Create a wordlist file from the given URL up to 2 links depth
Output an alphanumeric wordlist from the given URL with words of minimum 5 characters
Output a wordlist from the given URL in debug mode including email addresses
Output a wordlist from the given URL using HTTP Basic or Digest authentication
Output a wordlist from the given URL through a proxy
SYNOPSIS
cewl [OPTION...] <URL>
PARAMETERS
-h, --help
Display help message.
-d, --depth <int>
Maximum crawl depth (default: 2)
-m, --min_word_length <int>
Minimum word length to include (default: 3)
-n, --min_word_number <int>
Minimum words required at each depth (default: 0)
-o, --outfile <file>
Output file for wordlist.
-u, --ua <string>
Custom user agent string.
--lowercase
Convert words to lowercase.
-s, --write_short_words <file>
File for words shorter than minimum length.
-e, --email_file <file>
File to save extracted emails.
-a, --meta
Extract words from meta tags.
--follow <int>
Number of redirects to follow (default: 2)
--delay <seconds>
Delay between requests.
--proxy <host:port>
Proxy server to use.
--cookie_file <file>
Cookie input file.
--cookie_jar <file>
Cookie output file.
DESCRIPTION
CEWL (Custom Word List generator) is a Ruby-based command-line tool designed for security testing and penetration testing. It performs a recursive crawl of a specified website URL to a configurable depth, extracts words from HTML content, filters them based on minimum length and other criteria, and generates a tailored wordlist. This is particularly useful for creating site-specific dictionaries for password cracking attacks, as it captures unique terminology, product names, employee names, and other relevant terms absent from generic wordlists.
Key features include customizable crawl depth, minimum word length, user-agent spoofing to evade detection, extraction of emails and meta tags, lowercase conversion, and support for proxies, authentication, cookies, and delays to mimic human behavior. Output can be directed to a file for immediate use with tools like John the Ripper or Hashcat.
CEWL respects some ethical boundaries but does not enforce robots.txt by default, emphasizing responsible use with permission. It's lightweight, fast for small sites, but resource-intensive on large ones.
CAVEATS
Requires Ruby; use ethically with permission only. Ignores robots.txt by default. May be blocked by WAFs. Slow on large sites; high memory usage possible.
BASIC EXAMPLE
cewl -d 2 -m 5 https://example.com -w wordlist.txt
Generates wordlist from example.com (depth 2, min 5 letters).
ADVANCED EXAMPLE
cewl -d 3 -u 'Mozilla/5.0...' --delay 2 --proxy localhost:8080 https://target.com -o custom.txt -e emails.txt
Crawls with stealth options and extracts emails.
HISTORY
Developed by Aidan Cully (digininja) around 2007. Written in Ruby for Kali Linux and security distributions. Actively maintained with enhancements for auth, proxies, and stealth.


