LinuxCommandLibrary

gemini

Find duplicate files

TLDR

Start a REPL session to chat interactively

$ gemini
copy

Send the output of another command to Gemini and exit immediately
$ [echo "Summarize the history of Rome"] | gemini [[-p|--prompt]]
copy

Override the default model (default: gemini-2.5-pro)
$ gemini [[-m|--model]] [gemini-2.5-flash]
copy

Run inside a sandbox container
$ gemini [[-s|--sandbox]]
copy

Execute a prompt then stay in interactive mode
$ gemini [[-i|--prompt-interactive]] "[Give me an example of recursion in Python]"
copy

Include all files in context
$ gemini [[-a|--all-files]]
copy

Show memory usage in status bar
$ gemini --show-memory-usage
copy

SYNOPSIS

gemini <subcommand> [options] [arguments]

PARAMETERS

--help
    Display a help message for the main command or a specific subcommand, providing usage details and available options.

--version
    Show the version number of the installed gemini tool.

load options
    Options specific to the load subcommand, used for importing VCF data and other annotations into a gemini database. Examples include --anno-dir to specify annotation directories, or --cores for parallel processing during data loading.

query options
    Options specific to the query subcommand, used for retrieving and filtering data from an existing gemini database. Key options include -q "SELECT ..." for the SQL query string, --gt-filter for genotype-based filtering, and --column to specify output columns.

annotate options
    Options specific to the annotate subcommand, used for adding external annotation sources (e.g., dbSNP, ClinVar, ExAC) to a gemini database. Examples include --columns to specify annotation sources, --a-dbsnp to add dbSNP annotations, or --a-clinvar for ClinVar annotations.

DESCRIPTION

The `gemini` command-line tool, specifically referring to the GEnome MINing platform, is a powerful and flexible utility designed for the analysis and interpretation of genomic variation data. It allows researchers to load and store genetic variant information, typically from VCF (Variant Call Format) files, into a highly indexed SQLite database.

Once loaded, `gemini` enables efficient and complex SQL-like queries against this data, facilitating tasks such as variant filtering based on population frequencies, functional annotations, inheritance patterns, and integration with phenotypic data. It simplifies the process of identifying disease-causing variants and exploring large-scale genomic datasets.

CAVEATS

The `gemini` command is not a standard, pre-installed Linux utility. It refers to a specialized bioinformatics software package (GEMINI - GEnome MINIng) that requires explicit installation, typically via Python's package manager (`pip`) or Conda. Its usage is highly specific to genomic data analysis and is not intended for general system administration or common command-line tasks. Users must also manage substantial input data (VCF files, reference genomes, annotation databases) for its effective use.

SUBCOMMAND DRIVEN ARCHITECTURE

Similar to popular version control systems like `git` or package managers like `apt`, `gemini` operates primarily through a series of distinct subcommands (e.g., load, query, annotate, stats, comp_hets, autosomal_recessive). Each subcommand performs a specific, specialized function in the variant analysis pipeline, and comes with its own set of dedicated options and arguments.

SQLITE DATABASE INTEGRATION

A core design principle and powerful feature of gemini is its reliance on SQLite. All genomic variant data, along with extensive associated annotations, are loaded and stored within a single, portable SQLite database file. This architecture allows for rapid, flexible, and complex SQL-based queries, enabling highly intricate filtering and data extraction that would be challenging or inefficient with traditional flat file formats.

HISTORY

The GEMINI (GEnome MINIng) project was initiated by Aaron Quinlan's lab at the University of Utah, with significant development contributions from others in the bioinformatics community. It was conceived to address the growing challenges of efficiently querying, filtering, and prioritizing variants from large-scale next-generation sequencing data. The first stable versions of the software emerged around 2013-2014, providing a novel database-centric approach to variant analysis, contrasting with earlier file-based methods, and quickly gaining traction in the genomics research community.

SEE ALSO

vcf-tools(1), bcftools(1), samtools(1), sqlite3(1), tabix(1)

Copied to clipboard