Kubernetes Horizontal Pod Autoscaling

Saiteja Bellam
6 min readSep 12, 2023

--

Kubernetes Horizontal Pod Autoscaling (HPA) is a powerful feature that allows users to automatically scale the number of replicas of a Deployment, Replication Controller, or Replica Set based on the resource usage of the pods. This can help ensure that your application has enough resources to handle incoming traffic and maintain performance, without the need for manual intervention.

Horizontal pod autoscaling

In this blog, we will explore what Horizontal Pod Autoscaling is, how it works, the benefits of using it, and how to get started with it. We will also provide a guide for creating an HPA and load-testing it to ensure that it is working correctly.

How it works?

The HPA controller works by periodically checking the resource usage of the pods and comparing it to the target utilization levels specified by the user. If the resource usage of the pods is below the target utilization, the HPA controller will scale the number of replicas down. If the resource usage is above the target utilization, the HPA controller will scale the number of replicas up.

To determine the resource usage of the pods, the HPA controller relies on metrics provided by the Kubernetes metrics server. The metrics server gathers metrics from the pods and the nodes in the cluster, and the HPA controller uses these metrics to determine the resource usage of the pods.

Benefits of HPA

There are several benefits to using Horizontal Pod Autoscaling in your Kubernetes cluster:

  • Improved performance: By automatically scaling the number of replicas based on the resource usage of the pods, HPA can help ensure that your application has enough resources to handle incoming traffic and maintain performance.
  • Reduced workload: With HPA, you don’t have to manually monitor the resource usage of your pods and adjust the number of replicas manually. This can save you time and reduce the workload on your team.
  • Cost savings: By automatically scaling the number of replicas up or down as needed, HPA can help you avoid the costs associated with over-provisioning resources or underutilizing resources.

How to get started with HPA

To get started with Horizontal Pod Autoscaling, you will need to do the following:

  1. Ensure that the Kubernetes metrics server is running in your cluster. The metrics server is responsible for gathering metrics from the pods and nodes in the cluster, and the HPA controller relies on these metrics to determine the resource usage of the pods.
  2. Choose the resource that you want to use as the basis for autoscaling. The HPA controller can autoscale based on CPU usage, memory usage, or a custom metric.
  3. Set the target utilization levels for the resource that you have chosen. The HPA controller will use these levels to determine when to scale the number of replicas up or down.
  4. Create an HPA resource and specify the Deployment, Replication Controller, or Replica Set that you want to autoscale, as well as the target utilization levels and the resource that you want to use as the basis for autoscaling.

Guide to creating an HPA and load testing

Now that we have a basic understanding of what Horizontal Pod Autoscaling is and how it works, let’s walk through a guide for creating an HPA and load testing it to ensure that it is working correctly.

Step 1: Ensure that the Kubernetes metrics server is running

Before you can create an HPA, you need to ensure that the Kubernetes metrics server is running in your cluster. The metrics server is responsible for gathering metrics from the pods and nodes in the cluster, and the HPA controller relies on these metrics to determine the resource usage of the pods.

To check if the metrics server is running, you can use the following command:

kubectl get deployment metrics-server -n kube-system

If the metrics server is running, you should see a Deployment named “metrics-server” in the “kube-system” namespace. If the metrics server is not running, you can deploy it by running the following command:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

Step 2: Choose the resource that you want to use as the basis for autoscaling

Next, you need to decide which resource you want to use as the basis for autoscaling. The HPA controller can autoscale based on CPU usage, memory usage, or a custom metric.

To autoscale based on CPU usage, you can use the following syntax in your HPA resource definition:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80

To autoscale based on memory usage, you can use the following syntax:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 2
maxReplicas: 10
targetMemoryUtilization: 80

To autoscale based on a custom metric, you will need to specify the metric name and the target utilization levels in the HPA resource definition. For example:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Object
object:
metricName: custom-metric
target:
apiVersion: v1
kind: Value
averageValue: 100

Step 3: Set the target utilization levels for the resource

Once you have chosen the resource that you want to use as the basis for autoscaling, you need to set the target utilization levels for that resource. The target utilization levels are the thresholds that the HPA controller will use to determine when to scale the number of replicas up or down.

For example, if you are autoscaling based on CPU usage, you can specify a target CPU utilization percentage. If the average CPU utilization of the pods exceeds this threshold, the HPA controller will scale the number of replicas up. If the average CPU utilization falls below this threshold, the HPA controller will scale the number of replicas down.

It’s important to choose appropriate target utilization levels that will ensure that your application has enough resources to handle incoming traffic and maintain performance, without overprovisioning resources or underutilizing resources.

Step 4: Create an HPA resource

Once you have chosen the resource that you want to use for autoscaling and set the target utilization levels, you can create an HPA resource to enable Horizontal Pod Autoscaling for your Deployment, ReplicationController, or ReplicaSet.

To create an HPA resource, you will need to specify the following information:

  • The API version of the HPA resource (e.g. “autoscaling/v2beta2”)
  • The kind of resource (e.g. “HorizontalPodAutoscaler”)
  • A name for the HPA resource
  • The Deployment, ReplicationController, or ReplicaSet that you want to autoscale
  • The minimum and maximum number of replicas that you want to allow
  • The target utilization levels and the resource that you want to use as the basis for autoscaling

Here is an example of an HPA resource definition that auto-scales a Deployment based on CPU usage:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80

To create the HPA resource, you can use the kubectl apply command, like this:

kubectl apply -f hpa.yaml

Step 5: Load test the HPA

Once you have created an HPA, it’s a good idea to load-test it to ensure that it is working correctly. This will help you verify that the HPA is scaling the number of replicas up or down as expected based on the resource usage of the pods.

To load test the HPA, you can use a tool like wrk or siege. These tools allow you to send many requests to your application in a short period of time, simulating high traffic.

As you load test the HPA, you can use the kubectl get hpa command to check the current status of the HPA and see if it is scaling the number of replicas up or down as expected.

Here is an example of using wrk to load test an HPA:

wrk -t12 -c400 -d30s http://<your-app-url>

This command sends 400 requests per second for 30 seconds, using 12 threads. You can adjust the number of requests, threads, and duration to suit your specific needs.

As you load test the HPA, you should monitor the resource usage of the pods to ensure that they are not being overutilized or underutilized. If the resource usage consistently exceeds the target utilization levels, it may be necessary to adjust the target utilization levels or the minimum and maximum number of replicas to ensure that the HPA is functioning correctly.

In conclusion, Kubernetes Horizontal Pod Autoscaling is a powerful feature that allows users to automatically scale the number of replicas of a Deployment, ReplicationController, or ReplicaSet based on the resource usage of the pods. By following the steps outlined in this guide, you can create an HPA and load test it to ensure that it is working correctly and helping you maintain optimal performance and resource utilization in your Kubernetes cluster.

--

--

Saiteja Bellam

Tech and Business enthusiast, Currently working at Fournine.