git-annex
Manage large files with Git
TLDR
Initialize a repo with Git annex
Add a file
Show the current status of a file or directory
Synchronize a local repository with a remote
Get a file or directory
Display help
SYNOPSIS
git annex subcommand [options...] [files...] | path
PARAMETERS
init [name]
Initialize a Git repository with annex
add [options] files
Add files to the annex
rm [options] files
Remove files from annex and Git
drop [options] files
Remove content from local annex (keeps pointer)
get [options] files
Retrieve file content to local annex
sync [options]
Sync annex changes with remotes
copy/move [options] files
Copy or move content to/from remotes
lock/unlock files
Lock files to prevent modification or unlock for editing
status
Show annex status
unused
Find unreferenced annex files
--auto
Enable automatic mode for bulk operations
--fast
Skip expensive checks
--json
Output in JSON format
--debug
Enable debug logging
--help / -h
Show help for command or subcommand
DESCRIPTION
git-annex extends Git to efficiently manage large files, datasets, and binaries without bloating the repository history. Instead of storing file contents directly in Git, it uses pointer files committed to Git, while actual content is stored in an annex with pluggable backends (local disk, SSH remotes, S3, Glacier, WebDAV, torrent, etc.).
This enables distributed storage where repositories can track availability of content across locations. Key workflows include adding files (git annex add), syncing presence/metadata (git annex sync), retrieving content (git annex get), and removing content while keeping pointers (git annex drop). It supports direct mode (files checked out directly) and indirect mode (pointers), preferred remotes, group policies, and automatic handling via --auto.
Ideal for scientific data, media libraries, backups, and collaborative projects. Integrates seamlessly with Git for version control of metadata. Requires Git 1.7.10+, uses adjusted branches for annex state. Powerful for datalad or reproducible research stacks.
CAVEATS
High disk usage possible with many remotes; steep learning curve for advanced backends and policies; requires consistent Git workflow; direct mode alters checkout behavior.
MODES
Indirect: Uses symlinks/pointers (default).
Direct: Files stored directly in repo (git annex init --direct).
KEY CONCEPT: PREFERRED CONTENT
Configures what content each repo wants/keeps via git annex wanted/preferred-content.
EXAMPLE WORKFLOW
git annex init myrepo
git annex add largefile.dat
git annex sync origin
git annex drop largefile.dat --from origin
HISTORY
Developed by Joey Hess starting 2010 as 'git-annex'; evolved from personal needs for large file Git handling. Actively maintained (v10+ in 2023), powers tools like DataLad. Key milestones: special remotes (2011), direct mode (2014), crypto backends.


