kubectl-autoscale

Manage Horizontal Pod Autoscalers

TLDR

Auto scale a deployment with no target CPU utilization specified

$ kubectl autoscale [[deploy|deployment]] [deployment_name] --min=[min_replicas] --max=[max_replicas]

Auto scale a deployment with target CPU utilization

$ kubectl autoscale [[deploy|deployment]] [deployment_name] --max=[max_replicas] --cpu-percent=[target_cpu]

SYNOPSIS

kubectl autoscale RESOURCE_TYPE/NAME [--min=MIN_PODS] --max=MAX_PODS [--cpu-percent=PERCENT]

RESOURCE_TYPE/NAME
    Specifies the type and name of the resource (e.g., deployment/my-app) to be autoscaled. This is a mandatory argument.

--min=MIN_PODS
    The lower limit for the number of pods that can be set by the autoscaler. Defaults to 1 if not specified.

--max=MAX_PODS
    The upper limit for the number of pods that can be set by the autoscaler. This is a mandatory option.

--cpu-percent=PERCENT
    The target average CPU utilization (as a percentage of the requested CPU) over all the pods, which the autoscaler tries to maintain. If not specified, the HPA will default to 80% if the resource defines CPU requests.

DESCRIPTION

The kubectl autoscale command is a powerful tool within Kubernetes for managing the scaling of applications. It enables the creation or modification of a Horizontal Pod Autoscaler (HPA) object, which automatically adjusts the number of pods running for a specified workload (such as a Deployment, ReplicaSet, ReplicationController, or StatefulSet).

The HPA dynamically scales pods up or down to meet demand, primarily by monitoring CPU utilization, but can also be configured for custom metrics. This ensures that applications have sufficient resources during peak loads and don't waste resources during low periods, optimizing cost and performance. Users define minimum and maximum pod counts, and an optional target CPU utilization percentage. Kubernetes then takes over, continuously monitoring the workload and making scaling decisions to maintain the desired performance level.

CAVEATS

Resource Requests: For CPU-based autoscaling to work effectively, pods must have CPU resource requests defined in their container specifications. Without these, the HPA controller cannot calculate CPU utilization as a percentage of requested CPU.

Metrics Server: To utilize --cpu-percent or other resource-based metrics, a Kubernetes Metrics Server must be deployed in the cluster.

Custom Metrics: While kubectl autoscale primarily supports CPU, HPAs can be configured for custom and external metrics through direct YAML manifests. This command does not expose those options.

Scaling Limits: Setting appropriate --min and --max values is critical to prevent over-scaling or under-scaling, which could lead to performance issues or excessive costs.

Rollout Interactions: Be mindful of how HPA interacts with rolling updates or other deployment strategies, as they both modify pod counts.

HOW IT WORKS

The HPA controller continuously monitors the specified metrics (e.g., CPU utilization) against the target values. Based on the observed values and the configured --min and --max pod counts, it calculates the desired number of replicas and updates the target resource (Deployment, etc.) accordingly.

AUTOSCALING ALGORITHM

The HPA uses a controller that periodically checks the metrics. For CPU, it calculates the average CPU utilization across all pods in the target workload and compares it to the --cpu-percent target. If the current utilization is higher, it scales up; if lower, it scales down. There are cool-down and stabilization periods to prevent rapid, unnecessary scaling changes.

TARGET RESOURCES

The command specifically targets deployments, replicasets, replicationcontrollers, and statefulsets. It does not directly autoscale individual pods or other resource types.

HISTORY

The Horizontal Pod Autoscaler (HPA) was introduced in Kubernetes as an Alpha feature in version 1.0, becoming Beta in 1.1, and Generally Available (GA) in 1.2. The kubectl autoscale command provides a simplified interface for creating and managing these HPA objects directly from the command line, abstracting away the underlying YAML definition.

Its development paralleled the maturity of the HPA API, making it easier for users to quickly set up basic CPU-based autoscaling without needing to write full HPA manifests, thus improving the user experience for common autoscaling scenarios.