LinuxCommandLibrary

datashader_cli

Render large datasets into images

TLDR

Create a shaded scatter plot of points and save it to a PNG file and set the background color

$ datashader_cli points [path/to/input.parquet] --x [pickup_x] --y [pickup_y] [path/to/output.png] --background [black|white|#rrggbb]
copy

Visualize the geospatial data (supports Geoparquet, shapefile, geojson, geopackage, etc.)
$ datashader_cli points [path/to/input_data.geo.parquet] [path/to/output_data.png] --geo true
copy

Use matplotlib to render the image
$ datashader_cli points [path/to/input_data.geo.parquet] [path/to/output_data.png] --geo [true] --matplotlib true
copy

SYNOPSIS

datashader_cli image [--width WIDTH] [--height HEIGHT] [--cmap CMAP] [--agg FUNC] [--output FILE] x_column y_column [input_file]

PARAMETERS

--width INT
    Output image width in pixels (default: 600)

--height INT
    Output image height in pixels (default: 600)

--cmap STR
    Colormap name (e.g., 'blues', 'viridis'; default: 'light_jet')

--agg STR
    Aggregation function ('count', 'mean', 'sum'; default: 'count')

--output FILE
    Output image file (PNG/JPEG; default: 'image.png')

--xrange MIN,MAX
    X-axis range limits

--yrange MIN,MAX
    Y-axis range limits

--cats COL
    Categorical column for multi-image output

DESCRIPTION

datashader_cli is a command-line interface to the Datashader Python library, designed for visualizing extremely large tabular datasets (e.g., billions of rows) by rasterizing them into high-resolution images. Unlike traditional plotting tools that fail on big data due to memory limits, datashader_cli uses binning and aggregation to render scatterplots, heatmaps, and similar visuals efficiently on CPU or GPU.

It supports inputs like CSV, Parquet, and Pandas DataFrames, specifying x/y columns, aggregation functions (count, mean, sum), color maps, and output dimensions. Perfect for geospatial data, log files, or any dense point clouds. Install via pip install datashader (CLI entrypoint included in recent versions). Outputs PNG/JPEG images ready for web or reports.

Key advantage: No data subsampling; renders full dataset losslessly at target resolution, scaling linearly with output size.

CAVEATS

Requires Python 3.8+ and pip install datashader; large datasets need ample RAM/disk. No interactive zooming; static images only. GPU acceleration via CuDF/CuPy optional.

INSTALLATION

pip install datashader[complete] for full features including Parquet/Arrow support.
Verify: datashader_cli --help

EXAMPLE

Render NYC taxi data:
datashader_cli image --width 1200 --height 800 --output taxi.png pickup_x pickup_y nyc_taxi.csv
Generates heatmap of pickups.

HISTORY

Part of Datashader project initiated by Anaconda (James A. Bednar et al.) in 2015 for big data viz. CLI added in v0.11 (2018) for non-Python workflows; matured in v0.15+ with better format support.

SEE ALSO

awk(1), gnuplot(1), convert(1), pandas(1)

Copied to clipboard