inference-snap
Local generative AI chat via Ubuntu inference snaps
TLDR
SYNOPSIS
inference-snap chatinference-snap statusinference-snap getsudo inference-snap use-engine enginesudo inference-snap show-machine
DESCRIPTION
inference-snap is the command-line interface for Ubuntu Inference Snaps: packaged generative AI models tuned for local CPU, GPU, or NPU hardware. Each snap ships model weights and a runtime that auto-detects the host and exposes a local chat API.The chat subcommand starts a terminal conversation and, on first use, brings up the background chat server. Supported model families include deepseek-r1, gemma3, gemma4, nemotron3-nano, and qwen-vl, depending on which inference snap is installed.use-engine switches the execution backend (for example cuda on NVIDIA GPUs) and downloads the model variant appropriate for that engine. show-machine reports RAM, architecture, and device details to help pick an engine. get prints server settings such as http.host and http.port; status summarizes engine choice and service health.Inference snaps also expose an OpenAI-compatible HTTP API for IDEs and other tools, but inference-snap itself is the snap-managed CLI for chat and administration.
CAVEATS
Requires a supported Ubuntu inference snap installed via snap. Engine changes and hardware inspection need sudo. First chat start may download model weights and take noticeable time.
