dolt
Version controlled SQL database management
TLDR
Execute a dolt subcommand
List available subcommands
SYNOPSIS
dolt [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] [ARGS]
Dolt operations are performed using subcommands, similar to Git. Common invocations include:
dolt clone repository_url
dolt add table_name
dolt commit -m "message"
dolt diff table_name
dolt sql -q "SELECT * FROM my_table"
PARAMETERS
-h, --help
Show help for the command or subcommand.
-q, --quiet
Suppress most output from Dolt.
--verbose
Enable verbose output, showing more details about operations.
--config
Use a specific configuration file instead of the default.
--dir
Specify the working directory for Dolt operations.
--debug
Enable debug logging for detailed internal operation visibility.
DESCRIPTION
Dolt is a SQL database that supports Git-like version control capabilities. It allows users to clone, branch, merge, push, and pull data just like source code. This brings powerful collaboration and auditing features to datasets, enabling data engineers, analysts, and developers to manage data with the same rigorous version control practices applied to software development.
It provides a command-line interface highly inspired by Git, as well as a MySQL-compatible server mode for standard SQL operations. Dolt fundamentally changes how data can be managed, providing a complete history of every change to your database and enabling reversible operations and conflict resolution for data.
CAVEATS
Dolt is optimized for data versioning and collaboration rather than high-volume transactional workloads typical of traditional OLTP databases. While it offers MySQL compatibility, its performance characteristics may differ for extremely large datasets or complex analytical queries. Its storage model, while powerful for diffing and merging, can lead to higher disk space usage due to maintaining a complete history of changes. The ecosystem and tool integrations are still maturing compared to decades-old relational database systems.
CORE CONCEPTS
Dolt operates on several key concepts borrowed from Git, including:
Working Set: The current state of your data that you are actively modifying.
Staging Area (Index): An intermediate area where you prepare changes before committing them.
Commit Graph: A historical record of all changes, represented as a directed acyclic graph (DAG) of commits.
Branches: Pointers to specific commits, allowing for parallel lines of development or experimentation with data.
Remotes: Remote repositories (e.g., on DoltHub or a private server) that enable collaboration and data sharing.
COMMON USE CASES
Dolt is particularly useful for:
Data Auditing: Easily track every change made to your data, who made it, and when.
Collaborative Data Analysis: Data scientists and analysts can work on shared datasets, branch off for experiments, and merge changes back.
Reproducible Research: Ensure that data used for analysis or experiments can be precisely reproduced at any point in time.
Data Migration and Rollback: Manage database schema and data changes with the ability to easily revert to previous states or branch for testing migrations.
Data Sharing: Distribute datasets with their full history, enabling others to clone and contribute.
HISTORY
Dolt was developed by DoltHub, a company co-founded by Zachery Anderson and Timothy Sehn, with the goal of applying Git-like version control principles to SQL databases. The project was first publicly released in 2019. It was designed from the ground up to provide a robust solution for data versioning, branching, and merging, addressing common challenges faced by data professionals in managing evolving datasets.