kaggle-datasets
Manage Kaggle datasets
TLDR
List all datasets owned by a user or organization
Search dataset by name
Download a dataset
Create a public dataset
Download metadata of dataset
Initialize metadata for dataset
Delete a dataset
SYNOPSIS
kaggle datasets <list | files | download | create | init | version | status | delete | update | metadata | private | public | roles | add-tag | remove-tag> [options]
PARAMETERS
list
Lists available datasets on Kaggle, with extensive options for filtering by search terms, owner, size, file type, license, tags, user, language, and sorting.
files <dataset-slug>
Displays a list of all files contained within a specified dataset.
download <dataset-slug>
Downloads files from a specified dataset to the local filesystem. Supports downloading specific files or all dataset content, and can skip existing files.
create -p <path>
Creates a new dataset on Kaggle from files located in a local directory. This operation requires a correctly formatted dataset-metadata.json file within the directory for initial setup.
init -p <path>
Initializes a local directory for a new dataset by generating a template dataset-metadata.json file, guiding the user in defining dataset properties.
version <dataset-slug>
Retrieves and displays detailed metadata about a specific version of a dataset, including file hashes, creation date, and associated comments.
status <dataset-slug>
Checks and reports the current upload or processing status of a dataset, useful after creating or updating a dataset.
delete <dataset-slug>
Permanently deletes a specified dataset from Kaggle. This action is irreversible.
update -p <path>
Updates an existing dataset with new files or modified metadata from a local directory, creating a new dataset version.
metadata <dataset-slug>
Views or allows interactive editing of a dataset's metadata, such as its title, description, and tags.
private <dataset-slug>
Changes the visibility of the specified dataset to private, restricting access to only its collaborators.
public <dataset-slug>
Changes the visibility of the specified dataset to public, making it accessible to the entire Kaggle community.
roles <dataset-slug>
Manages user roles and permissions for a dataset, allowing sharing and collaboration control.
add-tag <dataset-slug> <tag>
Adds a specified tag to a dataset, improving its discoverability and categorization.
remove-tag <dataset-slug> <tag>
Removes a specified tag from a dataset.
DESCRIPTION
The kaggle-datasets command is an essential component of the Kaggle API client, offering a powerful command-line interface for seamless interaction with datasets hosted on Kaggle.com. It empowers data scientists, machine learning practitioners, and researchers to programmatically list, search, download, create, update, and delete datasets, as well as manage their associated metadata and permissions. This tool significantly streamlines data workflows by enabling the automation of data acquisition, publication, and version control directly from the terminal or scripts. It integrates effortlessly into existing data pipelines and development environments. Users can explore dataset files, monitor upload status, and control dataset visibility (public or private) without needing to access the Kaggle website directly, thereby enhancing productivity for a wide range of data-centric projects. The command provides granular control over dataset lifecycle management, from initial creation to deprecation.
CAVEATS
Using kaggle-datasets necessitates prior configuration of Kaggle API credentials, typically through a kaggle.json file located in the ~/.kaggle/ directory. Users must adhere to the dataset slug format (e.g., ownerUsername/dataset-name) when referencing specific datasets. Operations involving large datasets, such as downloads or uploads, can be time-consuming and bandwidth-intensive. For dataset creation and updates, the accuracy and proper formatting of the dataset-metadata.json file are critical for successful execution.
AUTHENTICATION
The kaggle-datasets command, like all Kaggle API tools, requires user authentication. This is typically achieved by placing a kaggle.json file, which contains your Kaggle username and API key, in the ~/.kaggle/ directory. This file can be securely downloaded from your Kaggle account settings page.
DATASET SLUG FORMAT
Many kaggle-datasets subcommands require a 'dataset slug' to uniquely identify the target dataset. This slug follows a specific format: ownerUsername/dataset-name. For instance, 'kaggle/titanic' refers to the well-known Titanic dataset maintained by Kaggle itself.
HISTORY
The Kaggle API and its command-line client were developed to provide programmatic access to Kaggle's platform, extending capabilities beyond the web interface. This innovation enabled significant automation for data science workflows, simplifying the management of datasets, competition submissions, and kernel interactions directly from scripts or CI/CD pipelines. The kaggle-datasets command specifically evolved to support the entire lifecycle of data, from publishing new resources and maintaining versions to facilitating community sharing, thereby becoming a cornerstone for collaborative data projects on the platform.
SEE ALSO
kaggle competitions(1), kaggle kernels(1), kaggle models(1), curl(1)


