airflow
TLDR
Start the Airflow scheduler
SYNOPSIS
airflow command [subcommand] [options]
DESCRIPTION
Apache Airflow is a platform for programmatically authoring, scheduling, and monitoring workflows. The CLI provides comprehensive control over DAGs (Directed Acyclic Graphs), tasks, connections, and the Airflow services.
Workflows are defined as Python code, creating DAGs that describe how tasks should be organized and executed. The scheduler triggers tasks based on defined schedules and dependencies, while the web interface provides monitoring and manual intervention capabilities.
The tool manages connections to external systems (databases, APIs, cloud services) and variables for configuration. Resource pools allow controlling task concurrency. The database stores metadata about DAG runs, task states, and history.
Common workflows include initializing the database with db migrate, starting the scheduler and webserver, and using dags trigger to manually start DAG runs. Tasks can be tested individually without affecting production state using tasks test.
PARAMETERS
scheduler
Start the Airflow scheduler daemon to trigger DAG runswebserver
Start the Airflow web interface servertriggerer
Start the async trigger service for deferrable operatorsdags
Manage DAGs (list, trigger, pause, unpause, test, delete, backfill)tasks
Manage and test individual tasks (run, test, clear, list, render)db
Database operations (migrate, reset, clean, check, shell)connections
Manage connection configurations (add, delete, list, export, import)variables
Manage Airflow variables (get, set, delete, list, export, import)pools
Manage resource pools for task concurrency controlusers
Manage Airflow users (create, delete, list)config
View and manage configuration settingsproviders
Display information about installed providersinfo
Show system and environment informationversion
Display Airflow version-o, --output format
Output format: table, json, yaml, plain-v, --verbose
Enable verbose logging
CAVEATS
Requires proper configuration via airflow.cfg or environment variables before first use. The scheduler must be running for DAGs to execute on schedule. Database must be initialized with airflow db migrate before starting services. Some features require additional dependencies or executor configurations (Celery, Kubernetes).
HISTORY
Apache Airflow was created at Airbnb in 2014 by Maxime Beauchemin to manage their complex data pipelines. It was open-sourced in 2015 and became an Apache Incubator project in 2016. It graduated to a top-level Apache project in 2019. The platform has grown to become one of the most widely used workflow orchestration tools, with version 2.0 released in December 2020 introducing significant architectural improvements, and version 3.0 bringing further enhancements.


