LinuxCommandLibrary

kaggle-datasets

Manage Kaggle datasets

TLDR

List all datasets owned by a user or organization

$ kaggle [[d|datasets]] list --user [username]
copy

Search dataset by name
$ kaggle [[d|datasets]] list [[-s|--search]] "[dataset_name]"
copy

Download a dataset
$ kaggle [[d|datasets]] download "[dataset_name]"
copy

Create a public dataset
$ kaggle [[d|datasets]] create [[-p|--path]] [path/to/dataset] [[-u|--public]]
copy

Download metadata of dataset
$ kaggle [[d|datasets]] metadata [dataset_name]
copy

Initialize metadata for dataset
$ kaggle [[d|datasets]] init [[-p|--path]] [path/to/dataset]
copy

Delete a dataset
$ kaggle [[d|datasets]] delete [dataset_name]
copy

SYNOPSIS

kaggle datasets [OPTIONS] <COMMAND> [<ARGS>]...

Commands: create, delete, delete-version, download, list, metadata, status, update-flags, version

PARAMETERS

-h, --help
    Show help message and exit

-p, --path PATH
    Local path to dataset folder (default: current directory)

-m, --kaggle-metadata FILE
    Path to Kaggle metadata file (default: ~/.kaggle/kaggle.json)

--dir DIR
    Directory for temporary files (default: ~/.cache/kaggle)

DESCRIPTION

The kaggle datasets command is part of the official Kaggle API client, a Python-based tool for interacting with Kaggle's vast dataset repository from the Linux terminal. It enables data scientists and ML practitioners to list available datasets, download them efficiently, create new datasets or versions, update metadata, and manage permissions without using the web interface.

Key features include filtering datasets by keywords, owners, or usability scores; downloading entire datasets or specific files; and versioning for iterative improvements. Authentication requires a kaggle.json API token downloaded from your Kaggle account settings and placed in ~/.kaggle/ with 600 permissions. Installed via pip install kaggle, it supports large file handling and integrates into CI/CD pipelines for reproducible workflows.

Common use cases: bulk downloading competition datasets, automating dataset creation from local files, or querying public datasets by refined criteria like file size or license. It's lightweight, handles retries on network issues, and outputs in JSON or CSV for scripting.

CAVEATS

Requires API credentials in ~/.kaggle/kaggle.json with chmod 600; large downloads may need ample disk space and stable internet; not all datasets are downloadable due to private status or restrictions.

INSTALLATION

pip install kaggle; verify with kaggle --version

AUTHENTICATION

Download kaggle.json from Kaggle > Account > API; mkdir -p ~/.kaggle && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

COMMON SUBCOMMANDS

list: kaggle datasets list -s "keyword"
download: kaggle datasets download -d username/dataset -p ./data
create: kaggle datasets create -p ./dataset-folder

HISTORY

Released in 2018 as part of Kaggle API v1.5 by Google (Kaggle's parent); evolved with v1.6+ for better versioning and metadata support; widely used in data science since 2020 for automation.

SEE ALSO

kaggle(1), pip(1), curl(1)

Copied to clipboard