bedtools
Compare and manipulate genomic intervals
TLDR
Intersect file [a] and file(s) [b] regarding the sequences' [s]trand and save the result to a specific file
Intersect two files with a [l]eft [o]uter [j]oin, i.e. report each feature from file1 and NULL if no overlap with file2
Using more efficient algorithm to intersect two pre-sorted files
[g]roup a file based on the first three and the fifth [c]olumn and apply the sum [o]peration on the sixth column
Convert bam-formatted [i]nput file to a bed-formatted one
Find for all features in file1.bed the closest one in file2.bed and write their [d]istance in an extra column (input files must be sorted)
SYNOPSIS
bedtools <subcommand> [options] <input files>
PARAMETERS
-h, --help
Show help message and exit
-version
Print version information
-list
List all available subcommands
-ci, --check-intervals
Check if intervals are proper (chrom/start/end)
-iobuf N
Input/output buffer size (e.g., 500M, default 128M)
DESCRIPTION
Bedtools is a fast, flexible collection of utilities for genome arithmetic (e.g., intersect, merge, coverage) that use BED coordinates as input. It enables comparisons between data of arbitrary types (e.g., locations vs. locations, locations vs. reads) without a centralized database, making it ideal for high-throughput genomic analyses.
Key features include support for multiple formats (BED, GFF, VCF, BAM, BigWig), parallel processing for speed, and dozens of subcommands like intersect, closest, bamtobed, and multibamcoverage. Designed for biologists, it avoids programming by chaining commands in pipelines with Unix tools like sort and awk.
Common workflows: finding overlaps between peaks and genes, calculating coverage from BAM alignments, merging intervals. It's memory-efficient for large datasets but shines on sorted inputs. Widely used in NGS pipelines for ChIP-seq, RNA-seq, and variant calling.
CAVEATS
Most subcommands require sorted input by chromosome and position (use sort -k1,1 -k2,2n). Large BAM/BED files can be memory-intensive; use -iobuf to tune. Outputs unsorted unless -sorted specified.
POPULAR SUBCOMMANDS
intersect: Overlaps between files
merge: Combine overlapping intervals
closest: Nearest feature search
bamtobed: BAM to BED conversion
coverage: Read depth per interval
INPUT REQUIREMENTS
Files in BED6+ format; chrom names must match (e.g., chr1 vs 1). Use bedtools sort first.
HISTORY
Developed by Aaron Quinlan (2008-2009) with Ira Hall; first release 2009. Evolved from BedTools to bedtools v2 (2012+), now at v2.31.1 (2024). Standard in bioinformatics for 15+ years, with 1000s of citations.


