bedtools

Compare and manipulate genomic intervals

TLDR

Intersect file [a] and file(s) [b] regarding the sequences' [s]trand and save the result to a specific file

$ bedtools intersect -a [path/to/file_A] -b [path/to/file_B1 path/to/file_B2 ...] -s > [path/to/output_file]

Intersect two files with a [l]eft [o]uter [j]oin, i.e. report each feature from file1 and NULL if no overlap with file2

$ bedtools intersect -a [path/to/file1] -b [path/to/file2] -loj > [path/to/output_file]

Using more efficient algorithm to intersect two pre-sorted files

$ bedtools intersect -a [path/to/file1] -b [path/to/file2] -sorted > [path/to/output_file]

[g]roup a file based on the first three and the fifth [c]olumn and apply the sum [o]peration on the sixth column

$ bedtools groupby -i [path/to/file] -c 1-3,5 -g 6 -o sum

Convert bam-formatted [i]nput file to a bed-formatted one

$ bedtools bamtobed -i [path/to/file.bam] > [path/to/file.bed]

Find for all features in file1.bed the closest one in file2.bed and write their [d]istance in an extra column (input files must be sorted)

$ bedtools closest -a [path/to/file1.bed] -b [path/to/file2.bed] -d

SYNOPSIS

bedtools <subcommand> [options] <input files>

-h, --help
    Show help message and exit

-version
    Print version information

-list
    List all available subcommands

-ci, --check-intervals
    Check if intervals are proper (chrom/start/end)

-iobuf N
    Input/output buffer size (e.g., 500M, default 128M)

DESCRIPTION

Bedtools is a fast, flexible collection of utilities for genome arithmetic (e.g., intersect, merge, coverage) that use BED coordinates as input. It enables comparisons between data of arbitrary types (e.g., locations vs. locations, locations vs. reads) without a centralized database, making it ideal for high-throughput genomic analyses.

Key features include support for multiple formats (BED, GFF, VCF, BAM, BigWig), parallel processing for speed, and dozens of subcommands like intersect, closest, bamtobed, and multibamcoverage. Designed for biologists, it avoids programming by chaining commands in pipelines with Unix tools like sort and awk.

Common workflows: finding overlaps between peaks and genes, calculating coverage from BAM alignments, merging intervals. It's memory-efficient for large datasets but shines on sorted inputs. Widely used in NGS pipelines for ChIP-seq, RNA-seq, and variant calling.

bedtools

Compare and manipulate genomic intervals

TLDR

SYNOPSIS

PARAMETERS

DESCRIPTION

CAVEATS

POPULAR SUBCOMMANDS

INPUT REQUIREMENTS

HISTORY

SEE ALSO