rabbitmqctl-cluster

Manage RabbitMQ cluster nodes

TLDR

Display the status of the cluster

$ rabbitmqctl cluster_status

Display the status of the current node

$ rabbitmqctl status

Start the RabbitMQ application on a specific node

$ rabbitmqctl [[-n|--node]] [nodename] start_app

Stop the RabbitMQ application on a specific node

$ rabbitmqctl [[-n|--node]] [nodename] stop_app

Stop a specific RabbitMQ node

$ rabbitmqctl [[-n|--node]] [nodename] stop

Reset a specific RabbitMQ node to a clean state

$ rabbitmqctl [[-n|--node]] [nodename] reset

Make the current node join an existing cluster

$ rabbitmqctl join_cluster [nodename]

The `rabbitmqctl-cluster` is not a direct executable command but a conceptual grouping of cluster-related subcommands of `rabbitmqctl`. Below are synopses for common cluster management operations:

rabbitmqctl cluster_status
rabbitmqctl join_cluster <cluster_node>
rabbitmqctl change_cluster_node_type disc | ram
rabbitmqctl forget_cluster_node [--offline] <node>
rabbitmqctl leave_cluster

PARAMETERS

cluster_node
    For join_cluster, specifies the name of an existing node in the target cluster (e.g., 'rabbit@hostname'). The joining node must be stopped before executing this command.

disc | ram
    For change_cluster_node_type, specifies the new type for the node. 'disc' nodes store all definitions (queues, exchanges, bindings, users, etc.) on disk. 'ram' nodes store definitions in memory, requiring at least one disc node in the cluster to persist them.

--offline
    An optional flag for forget_cluster_node. When used, the command attempts to forget a node even if it is currently offline. Requires careful use to avoid cluster partitioning issues.

node
    For forget_cluster_node, specifies the name of the node to be removed from the cluster. This node is typically offline or will be permanently decommissioned. For leave_cluster, the command implicitly applies to the node on which it is executed.

DESCRIPTION

The term rabbitmqctl-cluster generally refers to the suite of cluster management commands available within the primary RabbitMQ command-line tool, rabbitmqctl. Unlike a standalone executable, cluster operations in RabbitMQ are performed by invoking rabbitmqctl with specific subcommands such as join_cluster, cluster_status, forget_cluster_node, and change_cluster_node_type. These commands enable administrators to form new clusters, add or remove nodes from existing clusters, inspect cluster state, and modify node characteristics within the cluster. They are fundamental for maintaining the high availability and scalability of RabbitMQ deployments, ensuring that messaging services can seamlessly operate across multiple distributed nodes.

CAVEATS

Erlang Cookie Mismatch: All nodes in a RabbitMQ cluster must share the same Erlang cookie. A mismatch will prevent nodes from communicating and forming a cluster.
Network Connectivity: Proper network connectivity and hostname resolution between all cluster nodes are essential for reliable operation.
Node Status: Commands like join_cluster require the joining node to be stopped, while cluster_status and forget_cluster_node (without --offline) require the target node to be running.
Data Loss and Partitioning: Misuse of commands like forget_cluster_node can lead to data loss or cluster partitioning if not carefully managed, especially with unsynchronised queues.
Disc Node Requirement: A RabbitMQ cluster must always have at least one disc node to persist its configuration and metadata. Removing the last disc node is not permitted without converting another node to disc type first.

ERLANG COOKIE

The Erlang cookie is a secret string that acts as a shared password for Erlang nodes to authenticate with each other. For RabbitMQ nodes to form a cluster, they must have identical Erlang cookies. This cookie is typically stored in a file (e.g., /var/lib/rabbitmq/.erlang.cookie or $HOME/.erlang.cookie) and must be consistent across all cluster members.

NODE TYPES (DISC VS. RAM)

RabbitMQ nodes can be of two types: disc or ram. Disc nodes store queue and exchange definitions, user data, virtual hosts, and other persistent metadata on disk. RAM nodes store these definitions only in memory. While RAM nodes generally perform better for certain operations due to less disk I/O, a cluster must always have at least one disc node to ensure persistence of the cluster's state. If all disc nodes are lost, the cluster's configuration would be lost. Changes to definitions on a RAM node are replicated to all other nodes, ensuring they are ultimately persisted by a disc node.

CLUSTER FORMATION BASICS

To form a RabbitMQ cluster: 1. Ensure all nodes have identical Erlang cookies. 2. Start the first node (which becomes a disc node by default). 3. For subsequent nodes, stop the RabbitMQ application (rabbitmqctl stop_app), then use rabbitmqctl join_cluster <existing_node>, and finally start the application (rabbitmqctl start_app). It's recommended to join nodes to an already running disc node. After joining, nodes will share definitions and messages can be routed across them.

HISTORY

RabbitMQ has supported clustering since its early versions, with the core clustering logic built upon Erlang's distributed capabilities. The `rabbitmqctl` utility, which provides all cluster management commands, has been an integral part of the RabbitMQ server distribution from the start. Over time, commands have been refined and new ones introduced (e.g., more robust handling of node types, improved `forget_cluster_node` functionality) to enhance cluster resilience, manageability, and provide better diagnostics. The fundamental approach of using `rabbitmqctl` to interact with the running Erlang VM and RabbitMQ application remains consistent.