bedtools
Compare and manipulate genomic intervals
TLDR
Intersect file [a] and file(s) [b] regarding the sequences' [s]trand and save the result to a specific file
Intersect two files with a [l]eft [o]uter [j]oin, i.e. report each feature from file1 and NULL if no overlap with file2
Using more efficient algorithm to intersect two pre-sorted files
[g]roup a file based on the first three and the fifth [c]olumn and apply the sum [o]peration on the sixth column
Convert bam-formatted [i]nput file to a bed-formatted one
Find for all features in file1.bed the closest one in file2.bed and write their [d]istance in an extra column (input files must be sorted)
SYNOPSIS
bedtools [subcommand] [options]
PARAMETERS
-a <BED/GFF/VCF>
The primary BED/GFF/VCF file. Required for most operations.
-b <BED/GFF/VCF>
The secondary BED/GFF/VCF file. Required for operations that compare two files.
-abam <BAM>
The BAM file to use for operations involving BAM data for file A.
-bam <BAM>
The BAM file to use for operations involving BAM data for file B.
-bedpe <BEDPE>
Use BEDPE input format (e.g., for paired-end reads).
-header
Print the header from the A file prior to results.
-sorted
Assume input files are already sorted. Improves performance when true.
-g <genome file>
Genome file to use (e.g., sizes of chromosomes). Required for some commands.
-d
Report the distance to the nearest feature in B.
-v
Only report those entries in A that have no overlap in B.
-wa
Write the original A entry along with each reported overlap.
-wb
Write the original B entry along with each reported overlap.
-f <float>
Minimum fraction of overlap required between A and B. Default is 0, can take values from 0 to 1.
-r
Require that the entire A entry be covered by at least one B entry.
DESCRIPTION
Bedtools is a suite of powerful utilities for genome arithmetic. It allows users to perform a wide range of operations on genomic intervals, such as intersections, unions, complements, and nearest-neighbor searches. Bedtools operates on various file formats, including BED, GFF, VCF, BAM, and more, making it highly versatile for genomic analysis.
Its primary goal is to enable researchers to efficiently manipulate and analyze genomic data, uncovering relationships between different genomic features. It is commonly used in genomics research for tasks like identifying overlapping genomic regions between datasets, calculating coverage statistics, extracting sequences from specific genomic intervals, and simulating sequencing experiments. Bedtools is designed to be fast, memory-efficient, and easily scriptable, making it a cornerstone of many bioinformatics pipelines.
CAVEATS
Bedtools relies on sorted input files for many operations. Ensure your data is sorted appropriately (e.g., using `sort -k1,1 -k2,2n`) or use the `-sorted` option if you know the files are sorted.
SUBCOMMANDS
Bedtools is organized around subcommands, each designed for a specific task. Common subcommands include intersect, coverage, map, genomecov, and jaccard. Refer to the bedtools documentation for a complete list and detailed explanation of each subcommand.
FILE FORMATS
Bedtools supports a variety of file formats for genomic data, including BED, GFF/GTF, VCF, BAM, and BEDPE. Understanding the nuances of each format is crucial for using Bedtools effectively.
BEST PRACTICES
For large datasets, consider using compressed files (e.g., BGZF) and indexing them with `tabix` to improve performance. Also, always check the output carefully to ensure that Bedtools is performing the desired operation correctly. Use `bedtools --help` or `man bedtools` to access extensive documentation and examples.
HISTORY
Bedtools was originally developed by Aaron Quinlan and initially released around 2009. It has since become a widely used and actively maintained bioinformatics tool, with contributions from numerous developers. Its development was driven by the need for efficient and flexible tools for manipulating genomic intervals, which are essential for many genomic analyses. The tool has evolved significantly since its initial release with enhancements for speed, new features and better handling of various file formats. Bedtools continues to be essential for researchers in genomics and related fields.