ollama
runs large language models locally
TLDR
SYNOPSIS
ollama [command] [options]
DESCRIPTION
ollama runs large language models locally. It handles model downloads, serving via a REST API, and interactive chat sessions.Supports a wide range of open models including Llama, Mistral, Gemma, Phi, Qwen, DeepSeek, and others. Models are pulled from the Ollama registry and cached locally.The API server provides OpenAI-compatible endpoints for chat completions, embeddings, and model management. Custom models can be created using Modelfiles that specify base models, system prompts, parameters, and adapter layers.
PARAMETERS
run MODEL [PROMPT]
Run model interactively or with a one-off prompt.pull MODEL
Download model from registry.push MODEL
Push model to registry.list (or ls)
List locally available models.show MODEL
Show model information (architecture, parameters, license).ps
List currently running models.stop MODEL
Stop a running model.rm MODEL
Remove a local model.cp SOURCE DESTINATION
Copy a model locally under a new name.serve
Start the Ollama API server (default port 11434).create NAME -f MODELFILE
Create a custom model from a Modelfile.--help
Display help information.
CAVEATS
Requires sufficient RAM/VRAM depending on model size. GPU acceleration is supported (NVIDIA, AMD, Apple Silicon). The API server listens on localhost:11434 by default; configure with OLLAMA_HOST environment variable.
HISTORY
Ollama was created by Jeffrey Morgan and first released in 2023. Built on llama.cpp, it simplifies the process of downloading, running, and managing open-source language models locally. The project quickly gained popularity as interest in running LLMs without cloud APIs grew.
