gemini
Find duplicate files
TLDR
Start a REPL session to chat interactively
Send the output of another command to Gemini and exit immediately
Override the default model (default: gemini-2.5-pro)
Run inside a sandbox container
Execute a prompt then stay in interactive mode
Include all files in context
Show memory usage in status bar
SYNOPSIS
gemini <subcommand> [options] [arguments]
PARAMETERS
--help
Display a help message for the main command or a specific subcommand, providing usage details and available options.
--version
Show the version number of the installed gemini tool.
load options
Options specific to the load subcommand, used for importing VCF data and other annotations into a gemini database. Examples include --anno-dir to specify annotation directories, or --cores for parallel processing during data loading.
query options
Options specific to the query subcommand, used for retrieving and filtering data from an existing gemini database. Key options include -q "SELECT ..." for the SQL query string, --gt-filter for genotype-based filtering, and --column to specify output columns.
annotate options
Options specific to the annotate subcommand, used for adding external annotation sources (e.g., dbSNP, ClinVar, ExAC) to a gemini database. Examples include --columns to specify annotation sources, --a-dbsnp to add dbSNP annotations, or --a-clinvar for ClinVar annotations.
DESCRIPTION
The `gemini` command-line tool, specifically referring to the GEnome MINing platform, is a powerful and flexible utility designed for the analysis and interpretation of genomic variation data. It allows researchers to load and store genetic variant information, typically from VCF (Variant Call Format) files, into a highly indexed SQLite database.
Once loaded, `gemini` enables efficient and complex SQL-like queries against this data, facilitating tasks such as variant filtering based on population frequencies, functional annotations, inheritance patterns, and integration with phenotypic data. It simplifies the process of identifying disease-causing variants and exploring large-scale genomic datasets.
CAVEATS
The `gemini` command is not a standard, pre-installed Linux utility. It refers to a specialized bioinformatics software package (GEMINI - GEnome MINIng) that requires explicit installation, typically via Python's package manager (`pip`) or Conda. Its usage is highly specific to genomic data analysis and is not intended for general system administration or common command-line tasks. Users must also manage substantial input data (VCF files, reference genomes, annotation databases) for its effective use.
SUBCOMMAND DRIVEN ARCHITECTURE
Similar to popular version control systems like `git` or package managers like `apt`, `gemini` operates primarily through a series of distinct subcommands (e.g., load, query, annotate, stats, comp_hets, autosomal_recessive). Each subcommand performs a specific, specialized function in the variant analysis pipeline, and comes with its own set of dedicated options and arguments.
SQLITE DATABASE INTEGRATION
A core design principle and powerful feature of gemini is its reliance on SQLite. All genomic variant data, along with extensive associated annotations, are loaded and stored within a single, portable SQLite database file. This architecture allows for rapid, flexible, and complex SQL-based queries, enabling highly intricate filtering and data extraction that would be challenging or inefficient with traditional flat file formats.
HISTORY
The GEMINI (GEnome MINIng) project was initiated by Aaron Quinlan's lab at the University of Utah, with significant development contributions from others in the bioinformatics community. It was conceived to address the growing challenges of efficiently querying, filtering, and prioritizing variants from large-scale next-generation sequencing data. The first stable versions of the software emerged around 2013-2014, providing a novel database-centric approach to variant analysis, contrasting with earlier file-based methods, and quickly gaining traction in the genomics research community.