Google Kubernetes Engine (GKE) offers various types of autoscaling for both workloads and infrastructure. These autoscaling mechanisms enable efficient resource utilization, ensuring that applications running on GKE can handle varying workloads without manual intervention. In this answer, we will explore the different types of autoscaling provided by GKE and how they function.
1. Horizontal Pod Autoscaler (HPA):
The Horizontal Pod Autoscaler adjusts the number of replicas (pods) in a deployment or replica set based on the observed CPU utilization or custom metrics. It scales the number of pods up or down to maintain the desired average CPU utilization across all pods. For example, if the CPU utilization exceeds the target threshold, the HPA will increase the number of pods to distribute the workload. Conversely, if the CPU utilization is below the target threshold, the HPA will decrease the number of pods.
Here's an example HPA configuration:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 50
2. Cluster Autoscaler:
The Cluster Autoscaler automatically adjusts the size of the GKE cluster by adding or removing nodes based on the demand for resources. It monitors the resource utilization of the cluster and scales the number of nodes accordingly. If there are pending pods due to insufficient resources, the Cluster Autoscaler will add new nodes. Conversely, if there are idle nodes, it will remove them to save costs.
Cluster Autoscaler can be enabled during cluster creation or added to an existing cluster. It integrates with the GKE cluster autoscaler sub-controller, which manages the lifecycle of nodes.
3. Node Auto Provisioning:
Node Auto Provisioning is an advanced feature that allows GKE to automatically create and manage node pools based on the resource requirements of the workload. It utilizes Cluster Autoscaler and Vertical Pod Autoscaler to optimize the allocation of resources. Node Auto Provisioning ensures that the cluster has the right amount of compute resources to handle the workload, improving resource utilization and reducing costs.
Node Auto Provisioning uses node templates to define the properties of the nodes in the pool. These templates can be customized with specific machine types, labels, and taints to meet the requirements of different workloads.
GKE offers three types of autoscaling: Horizontal Pod Autoscaler (HPA) for adjusting the number of pods, Cluster Autoscaler for scaling the cluster size, and Node Auto Provisioning for managing node pools. These autoscaling mechanisms enable GKE to efficiently allocate resources based on workload demands, ensuring optimal performance and cost-effectiveness.
Other recent questions and answers regarding Examination review:
- What are the differences between zonal and regional clusters in terms of high availability and cluster configuration changes?
- How does GKE handle workload deployment and what tools can be used for packaging and deployment?
- What are the components of a GKE cluster and what are their roles?
- What is Google Kubernetes Engine (GKE) and what is its purpose in the context of Google Cloud Platform (GCP)?

