dolt-blame
Show who last modified each database row
TLDR
Display the latest commit for each row of a table
Display the latest commits for each row of a table when the specified commit was made
Display help
SYNOPSIS
dolt blame [<options>] [<ref-spec>] [<path-or-table>...]
PARAMETERS
<ref-spec>
Specifies the commit to blame from (e.g., HEAD~1, main, v1.0). If omitted, blames the current HEAD.
<path-or-table>
The specific table name or file path to blame. This can be a table name like 'mytable' or a file path like 'data/file.csv'.
-L <start>,<end> or --line-range=<start>,<end>
Show only a specific range of lines. For tables, this refers to row numbers.
--porcelain
Output in a machine-readable format, suitable for scripting.
--working
Blame the version of the file or table as it exists in the working directory (uncommitted changes).
--cached
Blame the version of the file or table as it exists in the staged (cached) area.
--no-contents
Suppress showing the actual line contents in the output, only displaying commit and author information.
--ignore-rev <rev>
Ignore changes made by a specific commit hash when tracing history. Useful for excluding formatting-only commits.
--ignore-revs-file <file>
Ignore changes made by commits listed one per line in the specified file.
DESCRIPTION
dolt blame is a command in Dolt, a SQL database with Git-like version control features. Similar to git blame, it displays the author and commit information for every line of a specified table or file. This is invaluable for understanding the history of data changes, identifying who last modified a particular row or schema element, and when. It operates on the Dolt repository's history, allowing users to trace data lineage back to its origin. The command supports various options to refine the output, such as focusing on specific line ranges, ignoring certain revisions, or displaying information in a machine-readable format. It's a crucial tool for auditing data changes and collaborating on data-driven projects in a version-controlled environment.
CAVEATS
Unlike git blame which primarily operates on text files, dolt blame applies to table data, which can be more complex due to schema changes and data types. Interpreting blame on schema changes might require understanding Dolt's internal representation of schema. Blaming very large tables or an extensive history can be resource-intensive and time-consuming.
DATA VS. SCHEMA BLAME
dolt blame can be used to trace changes to both the table's data rows and its schema definition (e.g., column additions/deletions). The output will adapt to show the relevant history for each type of change.
PRIMARY KEY IMPORTANCE
In Dolt, primary keys are fundamental for identifying rows across versions. This is crucial for dolt blame to correctly track the lineage of individual data rows, even if their non-key values change over time.
HISTORY
Dolt is a SQL database built on Git principles, developed by DoltHub (now DoltHub, Inc.). It was first introduced around 2018-2019, aiming to bring the benefits of Git-like version control (branching, merging, diffing, blame) to structured data. dolt blame was a natural inclusion, mirroring the popular git blame command, but adapted to operate on SQL tables and schema. Its development reflects the broader trend of applying version control paradigms to data management, moving beyond traditional code repositories.
SEE ALSO
git blame(1), dolt log(1), dolt diff(1), dolt checkout(1)