tcpflow
Record TCP connection data for analysis
TLDR
Show all data on the given interface and port
SYNOPSIS
tcpflow [options] [BPF_filter_expression]
Examples:
tcpflow -i eth0 -o /var/tmp/flow "port 80 and host example.com"
tcpflow -r capture.pcap -F json "tcp and port 443"
PARAMETERS
-a
Analyze connections further, e.g., decode HTTP streams where possible.
-c
Print flows to standard output instead of creating individual files.
-D
Enable debugging output, showing more internal information.
-e
Exclude specific connections based on criteria (e.g., exclude based on port or host).
-F
Specify the output format for the flow data. Options include raw, xml, and json.
-i
Specify the network interface to capture packets from (e.g., eth0, wlan0).
-J
Output in JSON format. This is an alias for -F json.
-o
Specify the output directory where the reassembled flow files will be saved.
-r
Read packets from a pcap file instead of capturing from a live interface.
-s
Strip non-printable characters from the output, replacing them with dots or spaces.
-v
Enable verbose output, displaying more information about the capture process.
-X
Output in XML format. This is an alias for -F xml.
BPF_filter_expression
A Berkeley Packet Filter expression to filter the captured traffic (e.g., 'port 80', 'host 192.168.1.1').
DESCRIPTION
tcpflow is a powerful command-line utility for network analysis that captures TCP traffic and reassembles the data streams based on their TCP sequence numbers. Unlike tools like tcpdump which display raw packet headers and payloads, tcpflow reconstructs the entire conversation, writing each direction of a TCP connection to a separate file. For instance, an HTTP request and its corresponding response would be saved into distinct files, making it straightforward to analyze application-layer data.
This tool is invaluable for tasks such as network forensics, debugging client-server interactions, extracting files or content from network traffic, and monitoring specific protocol flows. It supports BPF (Berkeley Packet Filter) syntax for precise traffic filtering and can process either live network captures or existing .pcap files. Output can be raw binary, human-readable ASCII, or structured formats like XML and JSON for automated processing.
CAVEATS
Large Output Files: Depending on traffic volume, tcpflow can generate a large number of files or very large single output files, potentially consuming significant disk space quickly.
Memory Usage: Processing very high-volume traffic or large PCAP files can consume substantial system memory, especially when reassembling many concurrent streams.
Stateless Protocol Limitations: tcpflow is specifically designed for TCP streams. It does not reassemble or process UDP or other stateless protocols.
Encrypted Traffic: While it can capture encrypted traffic (e.g., HTTPS), it cannot decrypt it without external tools or prior SSL key extraction. The reassembled streams will contain encrypted data.
OUTPUT FILE NAMING CONVENTION
tcpflow names its output files using a standard convention like source_ip.source_port-destination_ip.destination_port. Each direction of a conversation is saved separately, often with an appended sequence number if multiple flows occur between the same endpoints. This structured naming simplifies post-analysis and automation.
PRACTICAL USE CASES
Commonly used for extracting files transferred over HTTP, analyzing email traffic (SMTP/POP3/IMAP), debugging custom application protocols, or identifying data exfiltration by reconstructing network conversations. It provides the actual application-layer data without needing to manually piece together packets.
INTEGRATION WITH OTHER TOOLS
The ability to output in structured XML or JSON formats makes tcpflow highly integratable with scripting languages (Python, Perl) and other analysis tools for automated processing, data mining, and further analysis pipelines.
HISTORY
tcpflow was originally developed by Jeremy Elson. Its development aimed to fill a gap between raw packet capture tools like tcpdump and full-fledged protocol analyzers, by providing an easy way to reconstruct and save the actual data exchanged over TCP connections. It has been maintained and improved over the years, incorporating features like structured output formats (XML, JSON) to facilitate automated analysis. While not as widely known as tcpdump or Wireshark, it remains a valuable specialized tool in the network analyst's toolkit due to its unique stream reassembly capabilities.