LinuxCommandLibrary

cewl

Spider websites to generate wordlists

TLDR

Create a wordlist file from the given URL up to 2 links depth

$ cewl [[-d|--depth]] 2 [[-w|--write]] [path/to/wordlist.txt] [url]
copy

Output an alphanumeric wordlist from the given URL with words of minimum 5 characters
$ cewl --with-numbers [[-m|--min_word_length]] 5 [url]
copy

Output a wordlist from the given URL in debug mode including email addresses
$ cewl --debug [[-e|--email]] [url]
copy

Output a wordlist from the given URL using HTTP Basic or Digest authentication
$ cewl --auth_type [basic|digest] --auth_user [username] --auth_pass [password] [url]
copy

Output a wordlist from the given URL through a proxy
$ cewl --proxy_host [host] --proxy_port [port] [url]
copy

SYNOPSIS

cewl [OPTION...] <URL>

PARAMETERS

-h, --help
    Display help message.

-d, --depth <int>
    Maximum crawl depth (default: 2)

-m, --min_word_length <int>
    Minimum word length to include (default: 3)

-n, --min_word_number <int>
    Minimum words required at each depth (default: 0)

-o, --outfile <file>
    Output file for wordlist.

-u, --ua <string>
    Custom user agent string.

--lowercase
    Convert words to lowercase.

-s, --write_short_words <file>
    File for words shorter than minimum length.

-e, --email_file <file>
    File to save extracted emails.

-a, --meta
    Extract words from meta tags.

--follow <int>
    Number of redirects to follow (default: 2)

--delay <seconds>
    Delay between requests.

--proxy <host:port>
    Proxy server to use.

--cookie_file <file>
    Cookie input file.

--cookie_jar <file>
    Cookie output file.

DESCRIPTION

CEWL (Custom Word List generator) is a Ruby-based command-line tool designed for security testing and penetration testing. It performs a recursive crawl of a specified website URL to a configurable depth, extracts words from HTML content, filters them based on minimum length and other criteria, and generates a tailored wordlist. This is particularly useful for creating site-specific dictionaries for password cracking attacks, as it captures unique terminology, product names, employee names, and other relevant terms absent from generic wordlists.

Key features include customizable crawl depth, minimum word length, user-agent spoofing to evade detection, extraction of emails and meta tags, lowercase conversion, and support for proxies, authentication, cookies, and delays to mimic human behavior. Output can be directed to a file for immediate use with tools like John the Ripper or Hashcat.

CEWL respects some ethical boundaries but does not enforce robots.txt by default, emphasizing responsible use with permission. It's lightweight, fast for small sites, but resource-intensive on large ones.

CAVEATS

Requires Ruby; use ethically with permission only. Ignores robots.txt by default. May be blocked by WAFs. Slow on large sites; high memory usage possible.

BASIC EXAMPLE

cewl -d 2 -m 5 https://example.com -w wordlist.txt
Generates wordlist from example.com (depth 2, min 5 letters).

ADVANCED EXAMPLE

cewl -d 3 -u 'Mozilla/5.0...' --delay 2 --proxy localhost:8080 https://target.com -o custom.txt -e emails.txt
Crawls with stealth options and extracts emails.

HISTORY

Developed by Aidan Cully (digininja) around 2007. Written in Ruby for Kali Linux and security distributions. Actively maintained with enhancements for auth, proxies, and stealth.

SEE ALSO

wget(1), curl(1), lynx(1), john(1), hashcat(1)

Copied to clipboard